The 1000 Genome Project promises to provide genetic clues to all the major ailments plaguing humankind.
For a long time in the history of science, scientists had relied on tact and finesse in their investigations into Nature. They designed ingenious experiments and constructed exquisite theories to probe into Nature’s patterns. But some of them are now combining finesse with brute force, and in the process uncovering some of Nature’s most profound mysteries.
At the Wellcome Trust Sanger Institute in Cambridge in the United Kingdom, biologists are using brute force like never before in the history of biology. They are sequencing genomes (the full complement of genes in a person) at breakneck speed: about 300 million bases of DNA an hour, seven billion a day, 50 billion a week. In the last six months, scientists there have sequenced more than one trillion letters of genetic code. That is the equivalent of 300 human genomes. Every two minutes, the institute generates as much sequence as was done in the first five years of genome mapping (from 1982-1987).
While sequencing at such a speed, which will itself keep going up each year, biologists are getting closer to answering some critical questions. At a fundamental and philosophical level, it will tell us why we are all so similar and yet so different. At a more practical level, it will tell us why some of us get sick while others don’t. Or to be precise, we will soon know how genetic variation contributes to disease. Says Richard Durbin, co-leader of the three-year 1000 Genome Project that the Institute launched with two other institutions: “At the end of the project, we will have a much clearer picture of what the human genome really looks like.”
The first draft of the human genome, produced by US and UK scientists in 2000, was a major breakthrough in biology. However, there were many gaps in the draft that have still not been plugged. It turns out that the gaps contain the crucial data that we need to understand health and disease. Moreover, the draft was based only on primary data. It is the secondary data, the variations in the reference sequence, which will tell us about risk factors for diseases. That is what biologists are after now.
The 1000 Genome Project was launched in January this year with the aim of producing a map of the human genome that is medically relevant. There are three institutions in the project: the Wellcome Trust Sanger Institute, the Beijing Genomics Institute at Shenzen in China, and the National Human Genome Research Institute at Bethesda, Maryland, in the US. Later, three US based companies — 454 Life Sciences, Illumina and Applied Biosystems — joined the project by providing sequencing equipment. This sequencing equipment has been developed recently and has not been tested in actual research. It has provided what biologists there call the next generation sequencing technology.
The power of this technology was unimaginable even two years ago. At that time the institute had 75 machines and could sequence 50 billion bases a year. Now it has 25 machines and can sequence 50 billion bases a week. “We had a major shift in technology last year,” says Harold Swerdlow, head of sequencing technology at the Wellcome Trust Sanger Institute. “The speed of sequencing has gone up 100 times and the cost has gone down by 100 times.”
Without this improvement in technology, the 1000 Genome Project may not have been possible or would have taken too long. As the plans stand now, the first year is for a pilot project. It will do two things: learning to work with the technology, and test the technology itself. Scientists in the project are now sequencing the DNA of 180 people in three equal sets of 60: people of European origin (the sample came from Utah in the US), Africans (sample from Nigeria) and East Asians (sample from China and Japan). The sequencing is at a low depth, a term biologists use to denote the number of times they sequence a gene and thus its accuracy. By the end of the project, they would have sequenced 1000 genomes at an accuracy unavailable so far. They would have had to sequence a genome at least about 40 times to reach this stage.
Maps of genetic variation that exist now are called HapMap. The scientists already have about 130 places of genetic variation that can increase the risk of diabetes, breast cancer, arthritis, inflammatory bowel disease and so on. However, this map identifies variations at a frequency of 5 per cent or more. The 1000 Genome Project will identify gene variations at a frequency of 1 per cent or even less. It will then open up possibilities of developing markers and treatment for a large number of diseases. Says Sameer Brahmachari, a biologist and director general of the Council of Scientific and Industrial Research, New Delhi: “If the physical traits of the sequenced individuals are studied and correlated with their genome, the 1000 genome sequence can be an invaluable resource.”
Sources: The Telegraph (Kolkata, India)