Abstract
The analysis of coevolutionary signals from families of evolutionarily related sequences is a recent conceptual framework that provides valuable information about unique intramolecular interactions and, therefore, can assist in the elucidation of biomolecular conformations. It is based on the idea that compensatory mutations at specific residue positions in a sequence help preserve stability of protein architecture and function and leave a statistical signature related to residue-residue interactions in the 3D structure of the protein. Consequently, statistical analysis of these correlated mutations in subsets of protein sequence alignments can be used to predict which residue pairs should be in spatial proximity in the native functional protein fold. These predicted signals can be then used to guide molecular dynamics (MD) simulations to predict the three-dimensional coordinates of a functional amino acid chain. In this chapter, we introduce a general and efficient methodology to perform coevolutionary analysis on protein sequences and to use this information in combination with computational physical models to predict the native 3D conformation of functional polypeptides. We present a step-by-step methodology that includes the description and application of software tools and databases required to infer tertiary structures of a protein fold. The general pipeline includes instructions on (1) how to obtain direct amino acid couplings from protein sequences using direct coupling analysis (DCA), (2) how to incorporate such signals as interaction potentials in Cα structure-based models (SBMs) to drive protein-folding MD simulations, (3) a procedure to estimate secondary structure and how to include such estimates in the topology files required in the MD simulations, and (4) how to build full atomic models based on the top Cα candidates selected in the pipeline. The information presented in this chapter is self-contained and sufficient to allow a computational scientist to predict structures of proteins using publicly available algorithms and databases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Morcos F, Pagnani A, Lunt B et al (2011) Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci U S A 108:E1293–E1301
Hamilton N, Burrage K, Ragan MA, Huber T (2004) Protein contact prediction using patterns of correlation. Proteins 56:679–684
Ivankov DN, Finkelstein AV, Kondrashov FA (2014) A structural perspective of compensatory evolution. Curr Opin Struct Biol 26:104–112
de Juan D, Pazos F, Valencia A (2013) Emerging methods in protein co-evolution. Nat Rev Genet 14:249–261
Morcos F, Hwa T, Onuchic JN, Weigt M (2014) Direct coupling analysis for protein contact prediction. Methods Mol Biol 1137:55–70
Sulkowska JI, Morcos F, Weigt M et al (2012) Genomics-aided structure prediction. Proc Natl Acad Sci 109:10340–10345
Hopf TA, Colwell LJ, Sheridan R et al (2012) Three-dimensional structures of membrane proteins from genomic sequencing. Cell 149:1607–1621
Ovchinnikov S, Kamisetty H, Baker D (2014) Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. Elife 3:e02030
Kamisetty H, Ovchinnikov S, Baker D (2013) Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc Natl Acad Sci U S A 110:15674–15679
Skwark MJ, Abdel-Rehim A, Elofsson A (2013) PconsC: combination of direct information methods and alignments improves contact prediction. Bioinformatics 29:1815–1816
Ekeberg M, Lövkvist C, Lan Y et al (2013) Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E Stat Nonlinear Soft Matter Phys 87:012707
Hayat S, Sander C, Marks DS, Elofsson A (2015) All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences. Proc Natl Acad Sci U S A 112:5413–5418
Marks DS, Hopf TA, Sander C (2012) Protein structure prediction from sequence variation. Nat Biotechnol 30:1072–1080
Jones DT, Singh T, Kosciolek T, Tetchner S (2015) MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics 31:999–1006
Sadowski MI, Taylor WR (2013) Prediction of protein contacts from correlated sequence substitutions. Sci Prog 96:33–42
Hopf TA, Morinaga S, Ihara S et al (2015) Amino acid coevolution reveals three-dimensional structure and functional domains of insect odorant receptors. Nat Commun 6:6077
Schug A, Weigt M, Onuchic JN et al (2009) High-resolution protein complexes from integrating genomic information with molecular simulation. Proc Natl Acad Sci U S A 106:22124–22129
Tamir S, Rotem-Bamberger S, Katz C et al (2014) Integrated strategy reveals the protein interface between cancer targets Bcl-2 and NAF-1. Proc Natl Acad Sci U S A 111:5177–5182
dos Santos RN, Morcos F, Jana B et al (2015) Dimeric interactions and complex formation using direct coevolutionary couplings. Sci Rep 5:13652
Morcos F, Schafer NP, Cheng RR et al (2014) Coevolutionary information, protein folding landscapes, and the thermodynamics of natural selection. Proc Natl Acad Sci U S A 111:12408–12413
Mallik S, Kundu S (2015) Co-evolutionary constraints of globular proteins correlate with their folding rates. FEBS Lett 589:2179–2185
Morcos F, Jana B, Hwa T, Onuchic JN (2013) Coevolutionary signals across protein lineages help capture multiple protein conformations. Proc Natl Acad Sci U S A 110:20533–20538
Sfriso P, Duran-Frigola M, Mosca R et al (2016) Residues coevolution guides the systematic identification of alternative functional conformations in proteins. Structure 24:116–126
Cheng RR, Morcos F, Levine H, Onuchic JN (2014) Toward rationally redesigning bacterial two-component signaling systems using coevolutionary information. Proc Natl Acad Sci U S A 111:E563–E571
Jana B, Morcos F, Onuchic JN (2014) From structure to function: the convergence of structure based models and co-evolutionary information. Phys Chem Chem Phys 16:6496–6507
Noel JK, Levi M, Raghunathan M et al (2016) SMOG 2: a versatile software package for generating structure-based models. PLoS Comput Biol 12:e1004794
Noel JK, Whitford PC, Sanbonmatsu KY, Onuchic JN (2010) SMOG@ctbp: simplified deployment of structure-based models in GROMACS. Nucleic Acids Res 38:W657–W661
UniProt Consortium (2015) UniProt: a hub for protein information. Nucleic Acids Res 43:D204–D212
Bateman A (2000) The Pfam protein families database. Nucleic Acids Res 28:263–266
Finn RD, Coggill P, Eberhardt RY et al (2016) The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res 44:D279–D285
Göbel U, Sander C, Schneider R, Valencia A (1994) Correlated mutations and residue contacts in proteins. Proteins Struct Funct Genet 18:309–317
Lammert H, Schug A, Onuchic JN (2009) Robustness and generalization of structure-based models for protein folding and function. Proteins 77:881–891
Onuchic JN, Luthey-Schulten Z, Wolynes PG (1997) Theory of protein folding: the energy landscape perspective. Annu Rev Phys Chem 48:545–600
Pirovano W, Heringa J (2010) Protein secondary structure prediction. Methods Mol Biol 609:327–348
Yang Y, Gao J, Wang J et al (2018) Sixty-five years of the long march in protein secondary structure prediction: the final stretch? Brief Bioinform 19:482–494. https://doi.org/10.1093/bib/bbw129
Drozdetskiy A, Cole C, Procter J, Barton GJ (2015) JPred4: a protein secondary structure prediction server. Nucleic Acids Res 43:W389–W394
Yachdav G, Kloppmann E, Kajan L et al (2014) PredictProtein—an open resource for online prediction of protein structural and functional features. Nucleic Acids Res 42:W337–W343
Buchan DWA, Minneci F, Nugent TCO et al (2013) Scalable web services for the PSIPRED Protein Analysis Workbench. Nucleic Acids Res 41:W349–W357
Heffernan R, Paliwal K, Lyons J et al (2015) Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5:11476
Pronk S, Páll S, Schulz R et al (2013) GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29:845–854
Kutzner C, Páll S, Fechner M et al (2015) Best bang for your buck: GPU nodes for GROMACS biomolecular simulations. J Comput Chem 36:1990–2008
Meyer EE (1997) The first years of the Protein Data Bank. Protein Sci 6:1591–1597
Young J, RCSB PDBj PDBe Protein Data Bank (2009) Annotation and curation of the Protein Data Bank. Nat Preced. https://doi.org/10.1038/npre.2009.3379.1
Martínez L, Andreani R, Martínez JM (2007) Convergent algorithms for protein structural alignment. BMC Bioinformatics 8:306
Li Y, Zhang Y (2009) REMO: a new protocol to refine full atomic protein models from C-alpha traces by optimizing hydrogen-bonding networks. Proteins 76:665–676
Maupetit J, Gautier R, Tufféry P (2006) SABBAC: online Structural Alphabet-based protein BackBone reconstruction from Alpha-Carbon trace. Nucleic Acids Res 34:W147–W151
Rotkiewicz P, Skolnick J (2008) Fast procedure for reconstruction of full-atom protein models from reduced representations. J Comput Chem 29:1460–1465
Agre P (2006) The aquaporin water channels. Proc Am Thorac Soc 3:5–13
Ishibashi K, Sasaki S (1997) Aquaporin water channels in mammals. Clin Exp Nephrol 1:247–253
Agre P, Kozono D (2003) Aquaporin water channels: molecular mechanisms for human diseases1. FEBS Lett 555:72–78
Marks DS, Colwell LJ, Sheridan R et al (2011) Protein 3D structure computed from evolutionary sequence variation. PLoS One 6:e28766
Ash RB (2012) Information theory. Courier Corporation, Dover Publications Inc, Mineola, NY
Freedman D, Pisani R, Purves R (2007) Statistics: fourth international student edition. W. W. Norton & Company, New York, NY
Rapaport DC (2004) The art of molecular dynamics simulation. Cambridge University Press, New York, NY
Karplus M, Kuriyan J (2005) Molecular dynamics and protein function. Proc Natl Acad Sci U S A 102:6679–6685
Scheraga HA, Khalili M, Liwo A (2007) Protein-folding dynamics: overview of molecular simulation techniques. Annu Rev Phys Chem 58:57–83
Ruiz Carrillo D, To Yiu Ying J, Darwis D et al (2014) Crystallization and preliminary crystallographic analysis of human aquaporin 1 at a resolution of 3.28 Å. Acta Crystallogr F Struct Biol Commun 70:1657–1663
Subbiah S (1996) Protein motions. Springer, Berlin
Zhang Y, Skolnick J (2004) Scoring function for automated assessment of protein structure template quality. Proteins 57:702–710
Acknowledgments
The authors thank financial support from the São Paulo Research Foundation (FAPESP) (Grants 2015/13667-9, 2010/16947-9, 2013/05475-7, and 2013/08293-7) and funding from the University of Texas at Dallas.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
dos Santos, R.N., Jiang, X., Martínez, L., Morcos, F. (2019). Coevolutionary Signals and Structure-Based Models for the Prediction of Protein Native Conformations. In: Sikosek, T. (eds) Computational Methods in Protein Evolution. Methods in Molecular Biology, vol 1851. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8736-8_5
Download citation
DOI: https://doi.org/10.1007/978-1-4939-8736-8_5
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8735-1
Online ISBN: 978-1-4939-8736-8
eBook Packages: Springer Protocols