[Print] [Close]

The Human Genome

Dr Sandeep Kumar Bagga, Monday, July 8, 2002, 08:00 Hrs [IST]

Human genome contains 3164.7 million chemical nucleotide bases (A, C, T, and G). The average gene consists of 3,000 bases, but sizes vary greatly, with the largest known human gene being dystrophin at 2.4 million bases. The total number of genes is estimated at 30,000 to 40,000, much lower than previous estimates of 80,000 to 1,40,000 that had been based on extrapolations from gene-rich areas as opposed to a composite of gene-rich and gene-poor areas. The functions are unknown for more than 50% of discovered genes.

At least 18 countries namely Australia, Brazil, Canada, China, Denmark, European Union, France, Germany, Israel, Italy, Japan, Korea, Mexico, Netherlands, Russia, Sweden, United Kingdom, and the United States are involved in the human genome project. More information is available at www.ornl.gov/hgmis/.

U: Up to 99.9% nucleotide bases are exactly the same in all people. Methods are being developed to detect different types of variation, particularly the most common type called single-nucleotide polymorphisms (SNPs), which occur about once every 100 to 300 bases. Scientists believe SNP maps will help them identify the multiple genes associated with such complex diseases as cancer, diabetes, vascular disease, and some forms of mental illness. These associations are difficult to establish with conventional gene-hunting methods because a single altered gene may make only a small contribution to disease risk.

DNA variants in genes involved in drug metabolism, particularly the cytochrome P450 multigene family, are the focus of much current research in this area.

M: Matching patterns, a very important area, is achieved by many ways. Basic Local Alignment Search Tool (BLAST) utilizes a maximal segment pair (MSP) measurement. BLAST heuristically calculates the MSP to provide a measure of local similarity for any pair of sequences. FASTA approximates results between BLAST and a Smith-Waterman algorithm. Other systems use methods such as Hidden Markov Models (HMMs) and Neural Networks (NNs) for matching patterns.

FASTA is superior to BLAST for translated DNA-protein comparison and DNA database searches because it calculates a simple alignment that allows frame shifts. In contrast, BLAST performs forward frame searches separately. By treating forward reading frames as a single sequence, FASTA makes it much easier to produce high-quality alignments that extend the length of the protein sequence, resulting in improved sensitivity.

A: Analysis of gaps or insertions can be matched as a null, blank or wildcard at the same time. Tools such as BLAST or FASTA allow researchers the freedom to explore the potential for gap patterns and understanding the nature of genes structure. Position Specific Iterated Predict Secondary Structure (PSIPRED) uses NNs to perform an analysis on output obtained from PSI-BLAST.

Genes are in pairs, if some genes are missing, the gaps are revealed. Regardless of the gaps or insertions, the map is created. This is what gap matching is.

N:Number of proteins is large, improved algorithms can be used to produce multiple alignments, and extract sequence patterns or structural templates that define a family of proteins. Using this data, it is also possible to construct phylogenetic trees to trace the evolutionary path of proteins. Finally, with even more data, the information must be stored in large-scale databases. Comparisons become more complex, requiring multiple scoring schemes, and we are able to conduct genomic scale censuses that provide comprehensive statistical accounts of protein features, such as the abundance of particular structures or functions in different genomes.

In a nutshell, the sequence of events is genome sequence-->pair wise comparison, sequence and structure alignment-->multiple alignment, patterns, templates and trees-->databases, scoring, schemes and censuses.

G: Gene Tests are available for several diseases such as Alzheimer''s disease, Inherited breast and ovarian cancer, Hereditary nonpolyposis colon cancer, Cystic fibrosis, Duchenne muscular dystrophy/Becker muscular dystrophy, Dystonia, Fanconi anemia, group C, Factor V-Leiden, Fragile X syndrome, Hemophilia A and B, Huntington''s disease, Myotonic dystrophy, Neurofibromatosis type 1, Phenylketonuria, Adult Polycystic Kidney Disease, Sickle cell disease, Spinal muscular atrophy, Thalassemias and the list goes on and on.

Gene Tests are being utilized to diagnose disease, confirm a diagnosis, provide prognostic information about the course of disease, confirm the existence of a disease in asymptomatic individuals, and with varying degrees of accuracy, predict the risk of future disease in healthy individuals or their progeny.

E:Excellent progress in the direction of clinical trials can be viewed in today''s world. Leading to gene test is gene therapy and more than 500 clinical gene-therapy trials involving about 3500 patients have already been identified worldwide. Another important area is fusogenic liposome-mediated gene transfer. Liposomes are sealed concentric vesicles made up of natural body components namely cholesterol and phospholipids. Intact, bilayer of phosphate lipids fused to cell membrane and deliver carried gene. Liposomes have been widely accepted as a potential targeted delivery system.

Contact GENETEST for comprehensive information on test availability at www.genetests.org/.

N:Novelty while correlating human genome with drug design clearly reflects that if we have a gene sequence, we can determine the protein sequence with certainty. From there, prediction algorithms can be used to calculate the structure adopted by the protein. Geometric calculations can define the shape of the protein''s surface and molecular simulations can determine the force fields surrounding the molecule. Finally, using docking algorithms, one could identify or design ligands that may bind the protein, paving the way for designing a drug that specifically alters the protein''s function.

Here, the sequence of drug design is genome sequence-->protein sequence-->protein structure-->protein surface-->force field-->ligand complex. It''s a long and expensive process.

O: Overall, biodiversity, bio-molecular structure, cell metabolism, downstream processing in chemical engineering, drug design, functional genomics, proteome analysis, vaccine design are some of the areas in which human genome is an integral component.

Unprecedented volumes of genetic data is being generated and evaluated for better tomorrow.

M:Medicines tailored for you by knowing your genetic constitution is a valid question based on how human genome research will be able to evaluate your genetic make-up. In today''s world biochips are available which are linked to thousands of single stranded pieces of DNA, called oligonucleotides, capable of reacting with specific sequences of your DNA. Just put a drop of your blood in a biochip machine and come back tomorrow. Thousands of reaction will take place and magically your genotype will be displayed on the front end of the computer system.

Human genome analysis indicates how many gene variants you have directly related to the disease and based on this analysis you can have the best possible medicines.

E: Exon Finder is available on several server-operated programs such as First Exon Finder http://argon.cshl.org/FirstEF and programs like BLAST from www.ncbi.nlm.nih.gov/BLAST and http://blast.wustl.edu/. These programs are developed using C language in order to efficiently use processor time.

Beginning of the landmark is human genome - a crown.

----- The author is with Pharmaceutical Research and Clinical Trials, P.R.A.C.T. Advisory Service, Alexandria, VA (USA)

[Close]