218b De Novo Peptide Identification Via Mixed-Integer Linear Optimization and Tandem Mass Spectrometry

Christodoulos A. Floudas and Peter A. DiMaggio. Chemical Engineering, Princeton University, Princeton, NJ 08544

Of fundamental importance in proteomics is the problem of protein and peptide identification. Over the past couple decades, tandem mass spectrometry (MS/MS) coupled with high performance liquid chromatography (HPLC) has emerged as a powerful experimental technique for the effective sequencing of peptides and proteins. In recognition of the extensive amount of sequence information embedded in a single mass spectrum, tandem MS/MS has served as an impetus for the recent development of numerous computational approaches formulated to sequence peptides robustly and efficiently with particular emphasis on the integration of these algorithms into a high throughput computational framework for proteomics. The two most frequent computational approaches reported in literature are (a) de novo and (b) database search methods, both of which can utilize deterministic, probabilistic and/or stochastic solution techniques. De novo methods have received considerable interest since they are the only efficient means for applications such as finding novel proteins, identifying post-translational modifications and studying the proteome before the genome. The majority of de novo methods to date utilize dynamic programming techniques to solve the peptide identification problem [1]-[5].

In this work, we present a novel algorithm for the de novo sequencing of peptides using tandem mass spectroscopy and mixed-integer linear optimization (MILP) [6]. A two-stage framework is employed to accommodate missing peaks in the tandem mass spectrum; the first stage sequences candidate peptides using single amino acid weights and the second stage allows for combinations of two to three amino acid weights to be used in the construction of the candidate sequences. A preprocessing algorithm is utilized for the identification of important ions in the tandem mass spectrum which can be exploited in the problem formulation, such as ion peaks corresponding to the N- or C-terminus of the peptide and offsets indicative of post-translational modifications. Residue assignment ambiguities are subsequently resolved using a modified SEQUEST algorithm [7] so as to exploit information in the tandem mass spectrum which was not utilized in the sequencing calculations. This post-processing component of the method replaces weights in the candidate peptide sequences derived from the second stage calculations with permutations of amino acids consistent with these weights. The theoretical tandem mass spectrum for each candidate sequence is predicted and cross-correlated with the experimental tandem mass spectrum and the highest scoring sequence is reported as the most probable peptide. The significant contributions of this work include the generation of rank-ordered lists of candidate sequences and the direct incorporation of complementary ions into the sequencing calculations via constraint equations. Several computational studies will be presented to demonstrate the predictive capabilities and instrument-independency of the proposed approach.

[1] V. Dancik, T.A. Addona, K.R. Clauser, J.E. Vath, and P.A. Pevzner. De novo peptide sequencing via tandem mass spectrometry. J. Comp. Biol., 6(3):327-342, 1999.

[2] T. Chen, M.Y. Kao, M. Tepel, J. Rush, and G.M. Church. A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry. J. Comp. Biol., 10(3):325-337, 2001.

[3] T. Chen and L. Bingwen. A suboptimal algorithm for de novo peptide sequencing via tandem mass spectrometry. J. Comp. Biol., 10(1):1-12, 2003.

[4] B. Ma, , K.Z. Zhang, C. Hendrie, C. Liang, M. Li, A. Doherty-Kirby, and G. Lajoie. PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom., 17:2337-2342, 2003.

[5] A. Frank and P. Pevzner. Pepnovo: De novo peptide sequencing via probabilistic network modeling. Anal. Chem., 77(4):964-973, 2005.

[6] P.A. DiMaggio and C.A. Floudas. De novo peptide identification via tandem mass spectrometry and mixed-integer optimization. submitted for publication, 2006.

[7] J.K. Eng, A.L. McCormack, and J.R. Yates. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom., 5:976-989, 1994.