In a new development in the structural analysis of proteins under remote homology, a branch of bioinformatics to analyse the protein sequences, an Indian scientist has designed a method to predict functions of novel proteins.
The new method is to prepare substructures of the known protein to identify similar or corresponding protein sequences to predict the function of a novel protein. The method will reduce the time line for identifying novel proteins in future and help the research of biotherapeutic companies for new proteins, according to Dr Ashish Tendulkar, technical lead in the Bio Informatics & Analytics domain at the Pune-based Persistent Systems.
The finding is to address the structure-function relationship of a protein enabling elucidation of protein function, which is the most important task of the post genomics era. The analysis has been conducted using geometric and machine learning techniques and sub-structuring the known protein with goals of protein structure prediction and functional classification of the protein structures, he added. A protein data bank with the substructures of known proteins has been prepared and used to compare the functional sites, he informed.
"We can use the substructures of a known protein to compare with the unknown protein and since the function of the known protein sequence is established, the functional cluster of the unknown protein can be identified through machine learning techniques," said Dr Tendulkar. The invention of new method comes as another step in the genome wide function prediction techniques.
Last month, Dr Tendulkar has been awarded by the Department of Biotechnology, Government of India, with Young Scientists Award for his contribution towards genomics through the invention. The rights of the novel technology are with the scientist at present. "I am thankful to Persistent Systems for supporting me in the research and the further plans on the project will be carried out through the company," he added.
Specifically, the scientist represented geometries of substructures in proteins with unilateral structure descriptors in form of geometric invariants and applied clustering to form groups of similar geometries. This approach enables efficient and scalable all-against-all comparison between substructures and application of standard data mining techniques. The method, thus, overcomes the limitations caused by inherent nature of pair wise comparison step employed in the state of art techniques, he revealed.