Ali Rathore

Institution: 
Chabot College
Year: 
2008

Mining Significant Molecular Substructures using Functional Groups

We have developed a method to find significant substructures within large molecular databases. The significant substructures contained in chemical compounds help chemists discover useful properties of chemicals, such as the activity of a compound against certain diseases. For example, if a substructure occurs rarely in a universal database containing all compounds, but appears frequently in the subset of compounds which are active against cancer, then it is considered to be significant. Applications of significant substructures lie in areas such as drug development and discovery. To find the significant substructures, each chemical compound must first be represented as a set of features. Hence, a crucial aspect of chemical data mining is to identify features that accurately and efficiently characterize the chemical compounds. Our method, known as GraphSig, not only uses atoms and bonds, but also functional groups as features to incorporate chemical, rather than just structural information into the data mining procedure. A functional group is a group of atoms within a compound which is known to be responsible for the reactions which the compound undergoes, and thus it contains relevant chemical information about the compound which would be otherwise lost. To explore the potential of significant substructures, we employ them to classify compounds in a given database by predicting their activity against a particular disease. Empirical analysis of GraphSig shows that it outperforms state of the art classification methods both in terms of accuracy and scalability.

UC Santa Barbara Center for Science and Engineering Partnerships UCSB California NanoSystems Institute