Studying the human genome - the complete set of human genes - is a way of studying fundamental details about ourselves. The three billion letters of the human genome are written using the four-letter alphabet of DNA. The DNA is divided among 23 pairs of chromosomes that are found in each of the trillions of cells in our bodies. In 2003, The Human Genome Project produced a complete representative sequence of the human genome. Of course, people are not identical, and DNA sequences do differ subtly between individuals. Currently, a number of separate projects are charting sequence variations found in human populations.
The representative sequence is a composite from several people who donated blood samples. Originally, close to 100 people volunteered to give a sample of their blood. Each person provided their informed consent, affirming that they agreed to the study of their DNA. No names were attached to the blood samples and ultimately scientists used only a few of them. These measures ensured that the DNA sequences remained anonymous; not even the donors knew whether their samples were actually used or not.
The main goal of The Human Genome Project was to read, letter by letter, the three billion bases of human DNA. Before starting to sequence the human genome, scientists built maps of the chromosomes and developed and refined techniques for analyzing DNA. With the tools in place, project scientists began large-scale DNA sequencing in 1999. In just one year, they had amassed sequence data covering more than 80 percent of the genome.
The human genome is a massive text. If the three billion letters (or bases) of the genome were printed in telephone books, they would require a stack of books nearly as tall as the Washington monument.
To accurately determine the sequence of every base in the genome, scientists needed to read the three billion bases not just once, but at least six to ten times. Individual sequencing reactions could only reveal the order of a few hundred bases of DNA at a time - amounting to a fraction of a page. This meant that to place in order all of the DNA bases, it was necessary to produce many thousands of overlapping segments of DNA sequence.
To begin the project, researchers built maps of the human genome. They identified thousands of DNA sequence landmarks that helped them navigate across the chromosomes.
Developing genome maps was necessary preparation for DNA sequencing. These same maps also served to orient geneticists who were hunting for disease genes.
With enough landmarks in place, project scientists created "libraries" of clones that spanned the genome. Each clone contained a manageably small fragment of human DNA that was stored in bacteria. Scientists used the landmarks to tell them what part of the human genome each fragment came from.
This clone-by-clone approach made it possible to double check the location of each DNA sequence. It also allowed participating laboratories from around the world to carve up the genome and coordinate their work.
Building Libraries: Transcript
Clone libraries offered the same advantage of real libraries: orderly access to information. In most clone libraries, the DNA fragments were stored in E. Coli. These are bacteria that normally live in our intestines. Each E. Coli cell stored a single segment of human DNA and represented a single book of the library. Clone libraries allowed each human fragment to be tracked and easily copied.
The clone libraries were prepared using bacterial artificial chromosomes, or BACs. Each BAC clone contained 100,000 to 200,000 bases of DNA sequence. The large BAC clones were used to establish the order of the DNA sequences. To sequence the DNA, smaller-sized clones were needed. Project scientists cut the large BAC clones into smaller fragments of about 2,000 bases. These smaller fragments were typically stored in viruses called phage that can
infectE. coli cells.
E. Coli to Store and Copy DNA: Transcript
E. coli cells containing fragments of human DNA, or any other type of DNA, can be stored in freezers indefinitely. When scientists need to retrieve DNA from the library, they simply revive the cells by bringing them back up to 37 degrees Centigrade - gut temperature.
The E. coli cells act as copiers, producing many copies of the human DNA sequence that they contain. To prepare to sequence DNA, a clone of cells containing the same bit of human DNA is released into a rich, warm broth. The cells are shaken vigorously to provide them with air. This causes them to divide rapidly - about once every half hour. After incubating for just a single night, one third of a teaspoon of broth contains billions of E. coli cells and so, billions of copies of the particular fragment of human DNA they contained.
Preparing DNA for Sequencing Reactions: Transcript
The next morning, the E. coli cells are broken open to release their DNA. The human DNA is separated from the cell debris and washed clean.
Now there are enough copies of the human DNA fragment to set up a sequencing reaction.
Sequencing Reactions: Transcript
A DNA sequencing reaction includes four main ingredients, "Template" DNA copied by the E. coli; free bases, the building blocks of DNA that come in 4 types; short pieces of DNA called "primers"; and DNA polymerase, the enzyme that copies DNA.
The chemical reaction that makes DNA in a test tube is similar to what happens in a living cell: both rely on DNA polymerase and, in both cases, DNA strands have a head end, which is called the 5' end, and a tail end, which is called the 3' end. A DNA strand can grow only from its 3' end.
Making DNA in cells and sequencing DNA in test tubes both depend on complementary base pairing. The building blocks on opposite strands of DNA pair specifically - a C always pairs with a G, and an A always pairs with a T.
The primer sequence binds to its complementary sequence on the template DNA.
Free bases that match the template sequence can attach to the new strand's growing (3') end.
Among the free bases in the solution are a few that have a fluorescent dye attached to them. When a dye-bearing base attaches to the growing strand, it stops the new DNA strand from growing any further. A different colored dye is
attached to each of the four kinds of bases.
Products of Sequencing Reactions: Transcript
A completed sequencing reaction contains an array of colored DNA fragments. The shortest fragments correspond to the length of the primer plus one dye-colored base. The longest fragments are usually between 500 and 800 bases long, depending on when the sequencing reaction ran out of steam.
The products of sequencing reactions are fed into an automated sequencing machine. Automated sequencers have become increasingly sophisticated during the past decade. They can run more samples, process them more quickly, and are easier to operate.
Separating the Sequencing Products: Transcript
The DNA molecules produced during the sequencing reaction are separated from each other by a process called electrophoresis. DNA molecules are negatively charged. The sequencing machine sets up an electric field; all the DNA moves through a porous gel toward the positive electrode. The gel acts like a sieve; shorter DNA fragments move more quickly through the holes of the gel than do larger DNA fragments.
Reading the Sequencing Products: Transcript
As each DNA fragment reaches the end of the gel, a laser excites its fluorescent dye. A camera detects the color of the emitted light and passes that information to a computer. One by one, the machine records the colors of the DNA fragments that pass through the gel.
A single sequencing reaction can reveal the order of several hundred DNA bases.
Assembling the Results: Transcript
A computer program integrates the data from individual sequencing reactions. It can spot where DNA fragments overlap and order them as they originally were on the chromosome.
Many overlapping sequences reads are needed to generate the uninterrupted sequence of the original stretch of DNA. During the Human Genome Project, every base pair of DNA was sequenced an average of nine times. Some stretches of DNA were easy to read and needed to be sequenced little less often, while other stretches were more difficult to read and had to be sequenced more often.
During the Human Genome Project scientists ran more than 50 million sequencing reactions. Some 2000 scientists from more than two dozen labs around the world, worked on the project.
Working Draft Sequence: Transcript
Whenever a stretch of DNA that spanned 2,000 or more bases was assembled, it was placed into public databases within 24 hours. Anyone with access to the Internet could see and analyze the sequence data.
After sequencing the 3 billion letters in the human genome an average of nine times, the Human Genome Project had released DNA sequence for 99 percent of the genome. This finished sequence was 99.99 percent accurate. The project had completed all of its goals ahead of schedule and under budget.
The Human Genome Project also produced other advances, not expected to be accomplished until much later. These included an advanced draft of the mouse genome and an initial draft of the rat genome.
Medical researchers did not wait to use data from the Human Genome Project. When the project began in 1990, fewer than 100 human disease genes had been identified. At the project's conclusion in 2003, the number of identified disease genes had risen to more than 1,400.
The Human Genome Project focused on the DNA sequence of an individual. The next step was to analyze DNA sequences from different populations. This catalog of human genetic variation was called the HapMap. Completed in 2005, the HapMap used single nucleotide polymorphisms called SNPs to identify large blocks of DNA sequence called haplotypes that tend to be inherited together. To use the data, researchers compare haplotypes between people with and without a disease. Haplotypes shared by people with the disease are then examined in detail to look for associated genes. Already, scientists have used its data to identify a gene associated with age-related macular degeneration, a disease responsible for blindness among the elderly. It is expected that the HapMap will play an important role in identifying many more disease genes in the future.