[Print] [Close]

Proteomics: Consequences of the "Sequence"

Santosh A Khedkar, Wednesday, October 9, 2002, 08:00 Hrs [IST]

Presently, the science is moving from genomics to proteomics in order to get insight into the functional network of gene expression. Proteomics is all about going from sequence to the ultimate consequence--the actual biological function of a protein, or a drug to inhibit that biological function. Proteomics is the science that studies the proteins in general and in particular their changes, resulting from various disorders or the effect of external factors, such as toxic agents.

With the human genome sequence now being explored, scientists are frequently going to be encountered with sequences that produce proteins known to be important in disease but with unknown biological functions. Knowing the structures of numerous gene products, at least in part, will help us correlate structure with function. So in the absence of any other information, we can pull out the function of the protein by looking at its three-dimensional structure. In addition, the genomes of higher organisms are very small because multiple functions are grafted onto the same polypeptide chain.

Determining a protein's structure should be good for Drug Design because it makes it easier to identify molecules that fit into the protein's functional site or sites. We need to have a way of taking structures and probing them computationally or experimentally to say these are the sites that matter and these are the moieties that like to bind there. Thematics is a computational tool for identifying the residues in a protein active site that actually carry out chemistry. The idea is to carry out calculations on the protein structure and look for residues whose behavior as a function of pH in the computer is abnormal.

The effort to produce an index of all human proteins (human protein index, or HPI) began twenty years ago, before the initiation of the human genome program. Because DNA sequencing technology is inherently simpler and more scalable than protein analytical technology, and because the finiteness of genomes invited a spirit of rapid conquest, the notion of genome sequencing has displaced that of protein databases in the minds of most molecular biologists for the last decade.

From a drug discovery standpoint, the synergy between genomics and proteomics will help elucidate disease mechanisms, identify novel drug targets, and identify surrogate biomarkers that could be used in therapeutics. In the field of infectious diseases there is an urgent need for global approaches that can efficiently, precisely and integratively study structural and functional genomics and proteomics of microbial infections (infectomics). The combination of new (e.g. DNA and protein microarrays) and traditional approaches (e.g. cloning, PCR, gene knockout and knockin, and antisense) will help overcome the challenges we are facing today. It is assumed that the global phenotypic changes (infectomes) in microbes and their host during infections are encoded by the genomes of microbial pathogens and their hosts, expressed in certain environmental conditions devoted to specific microbe-host interactions. Global drug responses (pharmacomes) in microbes and their host can be detected by genomic and proteomic approaches.

Genomics versus Proteomics

It is known that proteomics is the protein equivalent of genomics and includes the study of gene expression at a functional level. The proteome of an organism is the protein complement of its genome. However, unlike the genome, the proteome is dynamic: it varies according to the cell type and the functional state of the cell. In addition, the proteome shows characteristic perturbations in response to disease and external stimuli. Proteomics combines state-of-the-art analytical methods with bioinformatics.

Proteomics is considered to be the most developed section of Functional Genomics. The systems approach is ascertained to underlie proteomic strategy to study the protein products of gene expression with an ultimate goal of drawing up complete protein indices for definite organisms, such as proteomes, human proteomes in particular.

Over the next few years, it is anticipated that functional genomics and proteomics will have major impacts on the clinical phases of drug development. Expected benefits are earlier proof-of-concept studies in man and increased efficiency of clinical trials through the availability of biologically relevant markers for drug efficacy and safety.

The major challenge for post-genomic research is to functionally assign and validate a large number of novel target genes and their corresponding proteins. Functional genomics approaches have, therefore, gained considerable attention in the quest to convert this massive data set into useful information. One of the crucial components for the functional understanding of unassigned proteins is the analysis of their experimental or modeled 3D structures. Structural proteomics initiatives are generating protein structures at an unprecedented rate but our current knowledge of 3D-structural space is still limited. Estimates on the completeness of the 3D-structural coverage of proteins vary but it is generally accepted that only a minority of the structural proteome has a template structure from which reliable conclusions can be drawn. Thus, structural proteomics has set out to build a map of protein structures that will represent all protein folds included in the 'global proteome'. Proteomics is currently in a phase of technological development and establishment, and demonstrating the capacity for high throughput is a major challenge.

Experimentalism in Proteomics

Proteomics offers scientists unprecedented power for the analysis of the expressed genome. This power comes from the well-defined experimental design that can be easily reproduced to survey the proteome more confidently for relevant biological response. Although the importance of defining experimental state in proteomics might seem trivial or obvious, this concept deserves a great emphasis. As in any particular scientific discipline, the power of proteomics is best realized when there are: 1) a clear hypothesis, 2) a strong understanding of the limitations of the technology, and 3) an opportunity to cross-validate findings using alternative experimental approaches.

Components of Proteomic Phenotypes

Although, most proteins do not increase or decrease in concentration in response to an environmental shift, induced proteins and repressed proteins are equally interesting in proteomics experiments. Heat-shock response is ubiquitous in biological systems and heat-shock proteins are the best examples of induced proteins. The function of these proteins was not known but, in time, it was shown that heat-shock proteins are chaperonins, which maintain other cellular proteins in physiological folding states at elevated temperature.

If, in adapting to a new environment, a cell no longer needs a given function of a protein then it can repress transcription, activate degradation or inactivate the protein through post-translational modification, as a result, proteins decrease in abundance. These proteins are called repressed proteins. And its concentration may be reduced by passive means including general proteolysis, dilution (e.g. in dividing cells), or even by leakage through the cell membrane. Best example of repressed proteins is the calcium binding protein calbindin D that decrease in abundance in kidneys of only those humans suffering from cyclosporin-A mediated nephrotoxicity.

Protein Resolution Techniques

At present perfect protein separation techniques for proteomics is ultra high resolution: because of the ability to separate simultaneously thousands of proteins (and their modified forms) to homogeneity, enabling the subsequent characterization. The technique is cheap, doesn't bias against any type of protein, is available in kit form and, ideally, would enable separated proteins to be achieved for future analysis. 2D-PAGE, combined with blotting onto membranes, satisfies these criteria.

Two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) continues to deliver high quality protein resolution and dynamic range for the proteomics researcher. There are two emerging technological approaches; the first one involves global proteolytic digestion of a complex sample and then analysis of this complex peptide mixture by tandem mass spectrometry (MS-MS), usually via an electro spray-ionization interface.

The second technological approach that is been developed to screen complex protein samples rapidly, involves the production of microarrays of antibodies, peptides or synthetic mimetic compounds. Low-density arrays of a subset of proteins or antibodies on filters can be used to study protein-protein or receptor-ligand interactions and affinities. High-density protein arrays or protein biochip technology combine affinity- capture and high-resolution MS for BIAcore analysis to characterize proteins and the molecules with which they interact.

To remain as the preferred method for protein separation and characterization, several key steps need to be implemented to ensure quality sample preparation and speed of analysis. In 2D-gel-based proteomics, experimental design must first incorporate parameters that consider the physiological properties of the organism being studied. However, experimental design is often constrained by resource limitations. The reagent, time and personnel resources are needed to perform replicate measures of a proteomic phenotype scale with the required number of replicates. The lack of efficient software to facilitate the quantitative comparison of 2D-gel images can add weeks of analysis time to a rigorously controlled proteomic experiment. Nevertheless, the value of proteomic results is highly correlated with the experimentalists' confidence in those results.

Incorporating the database concept into proteomics is particularly important for designing definite experiments. A database of proteomic phenotypes allows experimenters to distinguish between significant and artificial results. A database also helps to establish the degree of experimental variation that is intrinsic to the system under study and so significant changes can be discerned. Most importantly, a well-designed database enables investigators to identify proteomic signatures.

Proteomic 'signatures' are proteomic phenotypes that characterize a biological response. Signatures can be composed of one or more protein abundances that decrease, increase or do not change. For example, translational inhibitors can be divided into two groups based on the proteins they induce: some induce the cold-shock proteins, while others induce the heat-shock proteins. In E. coli, scientists have identified signatures that are associated with DNA synthesis inhibition, transalational inhibition and protein secretion inhibition, among others. What distinguishes phenotypes from signatures is specificity and reproducibility.

There are three key elements in defining and exploiting proteomic signature: a defined cell state, a database of phenotypes and a tool for pattern recognition. Working with in the defined cell state helps to reduce base line variation, which can obscure a signature.A database of proteomic phenotypes helps to discern reproducible responses from one-off observations. Pattern recognition tools to aid in the identification of proteomic signature are being developed but there is still a heavy reliance upon manual techniques. It depends upon the common principle in proteomic research 'let the cells tell us' about the underline mechanism.

Conclusion

iologists tell us that the DNA in a gene or several genes creates a related chemical substance, RNA, which in turn generates the protein that form each of us and our individual characteristics. This is one of the reasons why researchers are now turning their attention to proteomics to study the function and expression of proteins in both healthy and diseased cells. The challenge of identifying every expressed protein (approximately 30,000 genes identified in the human genome, each of which produces an average of four different proteins. It means there could be 120,000 or more unique proteins in the normal human 'proteome') is enormous and is considered to be more difficult than decoding the human genome.

Despite tremendous advances in the sensitivity and quantitative aspects of proteomics, most cellular proteins occur at concentrations that are below the level of detection. Very high or very low molecular weight proteins are also not easily measured, and solubility constraints make it difficult to study membrane proteins. Also the cost of proteomic research is a big problem. By the publication of Nature's human genome sequence, a large number of"omics" terms have been coined in science and it is humorously stated that 'economics is an important member of the "omics" family'.

Parallel with commercial projects, an organization has been established to achieve for the proteome what the Human Genome Mapping Project did for the genome. It is the Human Proteome Organization (HUPO), an international consortium of academic and industrial partners, lunched in February 2001, with funding of US$ 1 billion. It aims to identify every expressed protein and its variations within the next 5-10 years. And it eventually hopes to create a publicly funded Human Proteome Project.

The effort to produce an index of all human proteins (the human protein index, or HPI) began twenty years ago, before the initiation of the human genome program. Because DNA sequencing technology is inherently simpler and more scalable than protein analytical technology, and because the finiteness of genomes invited a spirit of rapid conquest, the notion of genome sequencing has displaced that of protein databases in the minds of most molecular biologists for the last decade. However, now that the human genome sequence is nearing completion, a major realignment is under way that brings proteins back to the center of biological thinking.

- The author is with Technology Development Center, Dr Reddy's Laboratories, Hyderabad.

[Close]