A paper published in Nature by a team from the National Human Genome Research Institute (NHGRI), USA, has used comparative genomics as a tool to elucidate the human genome. The pioneering study compared the sequence of the same large genomic region in thirteen vertebrate genomes. The comparison revealed some functionally important parts of the human genome that were previously unknown.
The researchers compared the patterns of transposon insertions among different species including human, chimpanzee, baboon, cat, dog, cow, pig, rat, mouse, chicken, zebrafish and two species of pufferfish (Fugu, Tetraodon). The investigators were able to address a heated controversy in the field of evolutionary genomics. Their analyses confirmed recently proposed trees of mammalian evolution indicating that primates (human, chimpanzee, baboon) are more closely related to rodents (mouse, rat) than to carnivores (cat, dog) or artiodactyls (cow, pig). Indeed, the evidence revealed by the new sequence data refutes alternative evolutionary trees that place rodents much farther away from primates.
Dr Eric Green's team analyzed the genomic region containing 10 previously identified genes, the most well known being the gene mutated in cystic fibrosis. However, an important discovery emanating from their multi-species comparative sequence analyses was the presence of substantial numbers of previously unidentified DNA segments that are conserved across a wide range of species, but which, unlike genes, do not code for proteins. Most of these conserved, non-coding regions could be uncovered only by using the sequences from multiple species. Indeed, they are not readily apparent by comparing just two species' sequences, e.g., those from human and mouse. While the precise function of these conserved elements is not yet known, there is evidence that they reflect non-coding sequences that have biological roles.
Francis S. Collins, director of NHGRI, commenting on the study, said, "This analysis provides convincing evidence that the sequence of the genomes of a wide range of organisms, from chimpanzees to zebrafish, provides powerful insight into the understanding of our own genome. When it comes to elucidating the biological functions encoded by genomes, it is now clear that there is strength in numbers." Dr Green, the lead author of the study, added, "Our efforts have produced the largest data set of evolutionarily diverse genomic sequence generated to date. By focusing on targeted genomic regions, but sequencing them in multiple species, we are getting a previously unavailable glimpse through the window of vertebrate genome evolution." Another co-author, Dr Webb C. Miller, of Penn State University and a computer scientist, commented, "Our studies demonstrate that an important route for identifying functional elements in the human genome will be sequencing the genomes of a menagerie of animals, not just two or three species, but many species that represent a wide sampling of the evolutionary tree. This study was just the beginning of a reconnaissance expedition, but it clearly illustrates why we need to explore many other animals' genomes to identify highly conserved sequences that reflect the functional parts of the vertebrate genetic blueprint."
This view was echoed by another co-author of the study, Dr David Haussler, of UCSC (USA). He said, "Not only is this data leading us to a better fundamental understanding of molecular evolution in vertebrate species, it is also guiding the way to the development of methods that use the evolutionary record itself to highlight functionally critical regions of the human genome." The UCSC team has constructed a specialized component of its Web site, http://www.genome.ucsc.edu for viewing the sequences generated from the multiple species, as well as for examining the results of the comparative analyses reported in the study.
The use of multi-species sequences for identifying functionally important regions of the human genome, as described in the Nature paper, will be a prominent component of another NHGRI-sponsored program called the ENCylopedia Of DNA Elements (ENCODE) project. The ultimate goal of the ENCODE project is to catalogue all functional elements in the human genome sequence, thereby deepening our understanding of human biology and stimulating the development of new strategies for preventing and treating disease.
The final set of findings reported in the study revealed that, while the general types of genome changes were similar among all vertebrates studied, differences in the relative contributions of the various changes have uniquely sculpted each species' genome. The researchers emphasized that because their findings pertain to just a single genomic region, they will need to conduct analyses of additional regions to get a broader perspective. Indeed, the targeted genomic region studied in the Nature paper represents just the first of more than 100 genomic regions being sequenced in multiple species and analyzed by the NHGRI program known as the NIH Intramural Sequencing Center (NISC) Comparative Sequencing Program. This broader effort, which is led by Dr. Green, seeks to push the frontiers of genome sequencing by taking a detailed look at the similarities and differences of the same stretch of DNA among multiple species. This program is specifically designed to complement the efforts of larger sequencing centers, which typically sequence the entire genome of an individual species, such as the rat, and then conduct relatively broad-brush analyses comparing one whole-genome sequence with another.
Dr Green commented, "The findings we report in the Nature paper are just the tip of the iceberg, a sneak preview of the future, when we will have genome sequences from many, many organisms." His program is now generating sequences from more than 30 vertebrate species, including representatives from relatively exotic evolutionary branches, such as marsupials and monotremes.
The multi-species sequencing of targeted regions of the human genome is expected to serve as a test bed for guiding decisions about which animals should be next in line for whole-genome sequencing. At present, it typically costs more than $50 million to sequence an entire genome of a vertebrate, far more than is needed to 'sample' a few targeted regions of its genome in an effort to get a preliminary glimpse.
In addition to completing the human genome sequence, researchers involved in the Human Genome Project have sequenced the genomes of a number of organisms commonly used in biomedical research, including a Escherichia coli, Saccharomyces cerevisiae, two types of roundworm, two types of fruit fly, two types of sea squirt, two types of pufferfish, the mouse and the rat. NHGRI-supported researchers are now sequencing the genomes of the chimpanzee, the honeybee, the sea urchin, the chicken, the rhesus macaque, the dog and a set of nine fungi.
The paper describing the comparative genome approach was published in Nature (Volume 424, pp. 788-793). It is entitled 'Comparative analyses of multi-species sequences from targeted genomic regions' and is by J. W. Thomas et al.