147c Using Atomic Properties to Identify Potential Reactants in Oxidoreductase Reactions

Fangping Mu1, Pat J. Unkefer2, Clifford J. Unkefer2, and William S. Hlavacek1. (1) Theoretical Biology and Biophysics Group, Theoretical Division, Los Alamos National Laboratory, T-10, MS710, Los Alamos, NM 87544, (2) National Stable Isotope Resource, Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87544

Motivation: Metabolism is the biochemical modification of chemical compounds in living organisms and cells. Our knowledge of secondary metabolism of endogenous chemicals is far from complete. The gaps in our knowledge are being revealed and filled to some extent through methods of high-throughput metabolite profiling. Using sensitive LC/MS techniques, we are identifying compounds not previously known to exist in living cells. Thus, we are interested in using computational methods to identify the activities of these new metabolites and their participation in biochemical pathways. One challenge is to predict whether a compound is a potential substrate or product of a given type of enzymatic reaction. We have developed a framework to address this challenge through a supervised learning technique. In this talk, we will present this framework and results for oxidoreductase reactions.

Results: We examined 1956 oxidoreductase reactions in the KEGG database. The vast majority of these reactions (1626) can be divided into twelve subclasses, each of which involves a particular type of chemical transformation. Examples of transformations include dehydrogenation of alcohols and hemi-acetals, oxidation of adlehyde or oxo groups to acids, dehydrogenation of CH-CH groups introducing double bonds, etc. For a given transformation, the local structures of reaction centers in substrates and products are characterized by a few simple patterns. These patterns are not unique to reactants but are widely distributed among KEGG metabolites. To distinguish reactants from non-reactants, we trained classifiers (linear kernel support vector machines) using negative and positive samples containing common patterns. The input to a classifier is a set of seven semi-empirical atomic features that can be determined from the 2D chemical structure of a compound, such as Gasteiger-Marsili partial charges. The accuracy of prediction for positives (negatives) is 64 to 93% (44 to 92%) when asking if a compound is a substrate and 71 to 98% (50 to 92%) when asking if a compound is a product. Sensitivity analysis reveals that this performance is robust to variations of the training data. Thus, metabolic connectivity can be predicted with reasonable accuracy from the presence or absence of local structural motifs in compounds and their readily calculated atomic features.