The discovery of a new drug, which is essentially the process of identifying a new molecule or compound, presents both scientific and analytical challenges. Pharmaceutical companies currently have to deal with long cycle times from drug discovery to commercialization and increased risk due to regulatory pressures. The availability of powerful technology / solutions enables researchers to overcome these challenges and has therefore become an essential component of the drug discovery and development process.
Pharma companies are increasingly becoming aware of the potential of genome research with many of the large pharmaceutical companies realizing that the route to the "drugs of tomorrow" increasingly begins with the genes discovered today. High throughput sequencing and automated genotyping technologies have resulted in many important genetic discoveries, paving the way for new technologies in drug development. This new genomics revolution will undoubtedly change the face of biomedical research.
In the pharmaceutical industry, bioinformatics plays an important role in all the three key workspaces of discovery, preclinical and clinical trials / research. The main objective has is to develop goal-centric solutions and tools in each of these three workspaces applicable to the pharmaceutical industry. Further this has facilitated in integrating storage, subsequent querying, analysis, and unified visualization of data.
Discovery: Proteins are transcribed from genes and are responsible for carrying out all the biological functions in the human body. Faulty genes or proteins are often responsible for causing diseases. Large-scale experimental analyses are essential to decipher and detect the processes occurring in most human diseases (diagnostics) and thus identify the biological targets. Only then can this information be used to discover a compound or compounds that act specifically, selectively on the particular target so that the disease may be effectively treated (therapeutics). High throughput genomic and post-genomic technologies like sequencing, proteomics and microarrays are used to discover the genes / proteins involved in a particular disease rapidly and precisely. However these high-throughput techniques that are responsible for a revolution in the drug discovery process present exciting and difficult data management and integration issues.
The data warehouse comprises different datamarts for the entire sequencing pipeline including quality tracking, laboratory information management system (LIMS) for managing the laboratory workflow and sequence assembly. The LIMS tracks various phases of lab workflow such as addition of reagents to the samples (DNA fragments) that are to be sequenced, transferring the samples from one plate to another, and processing of the samples at several decks (machines). The warehouse also integrates data related to the sequencing process, generates project management reports based on read lengths and quality scores, and links sequence quality to various factors of lab workflow (materials, plates, decks involved in the sequencing process and temperature/humidity of various phases of the workflow), enabling users to identify the factor(s) that led to drop/rise in the sequencing quality.
*A tool to identify mutations from sequencing data - Mutation Viewer Mutation Viewer identifies and visualizes mutations (Insertion, Deletion, Substitution or InDel) in a particular gene or a group of genes responsible for causing the disease in question, in patient samples, by comparing the observed sequence with RefSeq sequences available from NCBI. Mutations present over important domains are seen visually. Known SNPs can also be viewed.
*A JAVA based laboratory information management system (LIMS) for proteomics and microarrays
*LIMS for microarray provides a complete solution for management of microarray experiments within a secure environment. LIMS for proteomics seamlessly integrates the workflow from initial sample characterization through gel analysis to protein identification. The workflow supports 2D-gels including DIGE and mass spectrometry. The LIMS manages multiple roles and users, obtains information about the lab element the user is working on and generates reports.
*Annotation server to integrate publicly available information
*The annotation server contains annotation information downloaded from various publicly available gene annotation databases such as UniGene, Entrez Gene, HomoloGene, UniSTS, dbSNP, BioCarta, PubMed and Gene Ontology (GO). It uses this information to annotate genes corresponding to probes on the microarray.
*A C++ application to analyze experimental data in context with annotations This application interfaces with the gene annotation server described above facilitating analysis of global gene expression data generated using microarrays. This is basically a visualization tool to view genomics data. The genomic data is specified in an XML format and then rendered either as a 3D Scatter plot and/or a 2D barchart.Preclinical Research: Once a candidate drug is obtained, its potential to develop into a drug depends on its efficacy in humans.
Clinical Research: In clinical research the researcher carries out studies on human beings as opposed to animal models in the pre-clinical phase. In this phase, the drug is tested against healthy subjects, patients suffering from the target disease with the aim of studying the side-effects, dosage, etc prior to and after commercialization of the drug.
A web based data management tool for tissue banks - caTISSUEcaTISSUE is a web-based informatics system that helps in collection, processing, storage, and distribution of human specimens for correlative scientific cancer research. It keeps track of multiple specimens from the participant, tracks refined materials (RNA, DNA, protein) used for molecular analysis, and annotates bio-specimens with accumulating experimental data. caTISSUE Core also helps in reducing the functional complexity of the bio-specimen banks in collecting, processing, storing, and distribution of human specimens for correlative scientific cancer research. A data warehouse to assist researchers in comparing the analyzed clinical datasets (symptoms / phenotype) with the genetic information (genotype)
The warehouse contains information about SNPs from dbSNP, clinical trail data and genetic information. This comprehensive system enables researchers to analyze pharmacogenomic and clinical data in-house.
*Clinical trial management systemThis system aids clinical trial coordinators in tracking and maneuvering the performance of a clinical study. It helps to predict the final number of recruited subjects for a study/country/center based on the current recruitment rate and allows the user to view this information using graphs, plots and charts. The system also allows trial coordinators to make projections
The information management systems and analysis tools developed by Persistent Systems (Fig.1) have helped medical researchers and those in the pharmaceutical industry to effectively manage and analyze large volumes of data. This can help speed up the process of drug discovery and development.
(Armaity Davierwala is Life Sciences Consultant and & Mushtaq Ahmed, ioinformatics Consultant, Persistent Systems Pvt Ltd)