540e Physics-Based Protein Structure Prediction by Zipping and Assembly

M. Scott Shell¹, J. D. Chodera², S. B. Ozkan¹, V. A. Voelz², A. G. Wu¹, and Ken A. Dill¹. (1) Pharmaceutical Chemistry, University of California San Francisco, 600 16th Street, Box 2240, San Francisco, CA 94143, (2) Graduate group in biophysics, University of California San Francisco, 600 16th Street, Box 2240, San Francisco, CA 94143

In recent years it has become clear that the most significant bottleneck to the simulation of proteins with physics-based force fields is conformational sampling. Work by several groups using long, algorithmically-advanced simulations with supercomputer-scale resources have shown that small proteins can be folded accurately with sufficient sampling [1-4]. These studies underscore the accuracy of modern energy functions and highlight the need for improved sampling strategies.

We have developed an automated protein structure prediction algorithm based on a novel sampling protocol, called Zipping and Assembly of Proteins (ZAP). The concept of zipping is that folding is a sequence of conditional topologically local steps, e.g., that an earlier native contact formed between two residues 4 and 7 facilitates a subsequent native contact between residues 3 and 8 [5]. ZAP searches configurations by finding sequence-local contacts, enforcing these preferred contacts with a restraining potential, and detecting new contacts which then become effectively local due to prior restraints, the process repeating all the way to the native structure. In practice, the protein is first broken into small, overlapping fragments of 8-12 amino acids in length, and each fragment is simulated to equilibrium using 5-8 ns of replica exchange. Fragments which exhibit strong local contacts are grown by adding 2-3 residues to each end. These strong contacts are locked into place using a harmonic restraint, and additional REMD simulation is performed. This cycle of growth, contact detection, and addition of new restraints continues for each fragment until new contacts cannot be detected. At this stage multiple fragments, if growth proceeded with more than one, are assembled together by a long REMD simulation. The ZAP method uses additional algorithms to generate candidate initial conformations for replica exchange and detect incorrectly restrained (nonnative) contacts.

Our initial work with the ZAP approach is very promising: it has folded 8 out of 9 proteins smaller than 75 amino acids in length to an average backbone RMSD of 2.2 Å. These simulations require roughly 1-2 CPU-years of total compute time (3-5 weeks real-time when parallelized over 20 processors). All computations are performed using automated Python wrapper codes built around the AMBER7 molecular simulation package with the 1996 Cornell et al. force field and GB/SA implicit solvation [6].

[1] Y. Duan and P. A. Kollman, Science 282, 740 (1998).

[2] G. Jayachandran, V. Vishal, and V. S. Pande, Protein Sci. 13, 216 (2004).

[3] C. D. Snow, B. Zagrovic, and V. S. Pande, J. Am. Chem. Soc. 124, 14548 (2002).

[4] J. W. Pitera and W. Swope, Proc. Natl. Acad. Sci. U. S. A. 100, 7587 (2003).

[5] K. A. Dill, K. M. Fiebig, and H. S. Chan, Proc. Natl. Acad. Sci. U. S. A. 90, 1942 (1993).

[6] D. A. Case, et al. 2002, University of California, San Francisco.