Abstract
Thanks to the explosion of genomic sequencing, coevolutionary analysis of protein sequences has gained great and ever-increasing popularity in the last decade, and it is currently an important and well-established tool in structural bioinformatics and computational biology. This chapter concisely introduces the theoretical foundation and the practical aspects of coevolutionary analysis, as well as discusses the molecular modeling strategies to exploit its results in the study of protein structure, dynamics, and interactions. We present here a complete pipeline from sequence extraction to contact prediction through two examples, focusing on the predictions of inter-residue contacts in a single protein domain and on the analysis of a multi-domain protein that undergoes functional, large-scale conformational transitions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Weigt M, White RA, Szurmant H et al (2009) Identification of direct residue contacts in protein-protein interaction by message passing. Proc Natl Acad Sci U S A 106:67–72. https://doi.org/10.1073/pnas.0805923106
Jones DT, DW a B, Cozzetto D, Pontil M (2012) PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28:184–190. https://doi.org/10.1093/bioinformatics/btr638
Marks DS, Colwell LJ, Sheridan R et al (2011) Protein 3D structure computed from evolutionary sequence variation. PLoS One 6. https://doi.org/10.1371/journal.pone.0028766
Balakrishnan S, Kamisetty H, Carbonell JG et al (2011) Learning generative models for protein fold families. Proteins 79:1061–1078. https://doi.org/10.1002/prot.22934
Morcos F, Hwa T, Onuchic JN, Weigt M (2014) Direct coupling analysis for protein contact prediction. In: Kihara D (ed) Protein structure prediction. Springer, New York, NY, pp 55–70
Sułkowska JI, Morcos F, Weigt M et al (2012) Genomics-aided structure prediction. Proc Natl Acad Sci U S A 109:10340–10345. https://doi.org/10.1073/pnas.1207864109
Hopf TA, Colwell LJ, Sheridan R et al (2012) Three-dimensional structures of membrane proteins from genomic sequencing. Cell 149:1607–1621. https://doi.org/10.1016/j.cell.2012.04.012
T a H, Morinaga S, Ihara S et al (2015) Amino acid coevolution revealrs three-dimensional structure and functional domains of insect odorant receptors. Nat Commun 6:1–7. https://doi.org/10.1038/ncomms7077
Ovchinnikov S, Kamisetty H, Baker D (2014) Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. elife 3:e02030. https://doi.org/10.7554/eLife.02030
Hopf TA, Schärfe CPI, Rodrigues JPGLM et al (2014) Sequence co-evolution gives 3D contacts and structures of protein complexes. elife 3:e03430
Malinverni D, Jost Lopez A, De Los Rios P et al (2017) Modeling Hsp70/Hsp40 interaction by multi-scale molecular simulations and co-evolutionary sequence analysis. elife 6:e23471. https://doi.org/10.7554/eLife.23471
Szurmant H, Weigt M (2017) Inter-residue, inter-protein and inter-family coevolution: bridging the scales. Curr Opin Struct Biol 50:26–32. https://doi.org/10.1016/j.sbi.2017.10.014
Uguzzoni G, John Lovis S, Oteri F et al (2017) Large-scale identification of coevolution signals across homo-oligomeric protein interfaces by direct coupling analysis. Proc Natl Acad Sci 114:E2662–E2671. https://doi.org/10.1073/pnas.1615068114
Morcos F, Pagnani A, Lunt B et al (2011) Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci U S A 108:E1293–E1301. https://doi.org/10.1073/pnas.1111471108
Fantini M, Malinverni D, De Los Rios P, Pastore A (2017) New techniques for ancient proteins: direct coupling analysis applied on proteins involved in iron sulfur cluster biogenesis. Front Mol Biosci 4:1–14. https://doi.org/10.3389/fmolb.2017.00040
Morcos F, Jana B, Hwa T, Onuchic JN (2013) Coevolutionary signals across protein lineages help capture multiple protein conformations. Proc Natl Acad Sci U S A 110:20533–20538. https://doi.org/10.1073/pnas.1315625110
Parisi G, Zea DJ, Monzon AM, Marino-Buslje C (2015) Conformational diversity and the emergence of sequence signatures during evolution. Curr Opin Struct Biol 32:58–65. https://doi.org/10.1016/j.sbi.2015.02.005
Sutto L, Marsili S, Valencia A, Gervasio FL (2015) From residue coevolution to protein conformational ensembles and functional dynamics. Proc Natl Acad Sci 112:13567–13572. https://doi.org/10.1073/pnas.1508584112
Göbel U, Sander C, Schneider R, Valencia A (1994) Correlated mutations and residue contacts in proteins. Proteins Struct Funct Genet 18:309–317
Lapedes AS, Giraud BG, Liu L, Stormo GD (1999) Correlated mutations in models of protein sequences: phylogenetic and structural effects. Lect Notes Monogr Ser 33:236–256. https://doi.org/10.2307/4356049
Martin LC, Gloor GB, Dunn SD, Wahl LM (2005) Using information theory to search for co-evolving residues in proteins. Bioinformatics 21:4116–4124. https://doi.org/10.1093/bioinformatics/bti671
Burger L, Van Nimwegen E (2010) Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Comput Biol 6. https://doi.org/10.1371/journal.pcbi.1000633
Ekeberg M, Lövkvist C, Lan Y et al (2013) Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E 87:0127071–0127016. https://doi.org/10.1103/PhysRevE.87.012707
Cocco S, Feinauer C, Figliuzzi M et al (2017) Inverse statistical physics of protein sequences: a key issues review. Rep Prog Phys 81(3):032601
Jaynes ET (1957) Information theory and statistical mechanics. Phys Rev 106:620–630
Dunn SD, Wahl LM, Gloor GB (2008) Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics 24:333–340. https://doi.org/10.1093/bioinformatics/btm604
Kaján L, Hopf TA, Kalaš M et al (2014) FreeContact: fast and free software for protein contact prediction from residue co-evolution. BMC Bioinformatics 15:1–6. https://doi.org/10.1186/1471-2105-15-85
Baldassi C, Zamparo M, Feinauer C et al (2014) Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners. PLoS One 9:1–12. https://doi.org/10.1371/journal.pone.0092721
Seemayer S, Gruber M, Söding J (2014) CCMpred – fast and precise prediction of protein residue-residue contacts from correlated mutations. Bioinformatics. https://doi.org/10.1093/bioinformatics/btu500
Barton JP, De Leonardis E, Coucke A, Cocco S (2016) ACE: adaptive cluster expansion for maximum entropy graphical model inference. Bioinformatics 32:3089–3097. https://doi.org/10.1093/bioinformatics/btw328
Figliuzzi M, Barrat-Charlaix P, Weigt M (2018) How pairwise coevolutionary models capture the collective residue variability in proteins. Mol Biol Evol:1–17. https://doi.org/10.1093/molbev/msy007
Ekeberg M, Hartonen T, Aurell E (2014) Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences. J Comput Phys 276:341–356. https://doi.org/10.1016/j.jcp.2014.07.024
Gueudré T, Baldassi C, Zamparo M et al (2016) Simultaneous identification of specifically interacting paralogs and inter-protein contacts by direct-coupling analysis. Proc Natl Acad Sci 113:12186–12191. https://doi.org/10.1073/pnas.1607570113
Bitbol A-F, Dwyer RS, Colwell LJ, Wingreen NS (2016) Inferring interaction partners from protein sequences. Proc Natl Acad Sci 113:12180–12185. https://doi.org/10.1101/050732
Feinauer C, Skwark MJ, Pagnani A, Aurell E (2014) Improving contact prediction along three dimensions. PLoS Comput Biol 10:e1003847. https://doi.org/10.1371/journal.pcbi.1003847
Skwark MJ, Raimondi D, Michel M, Elofsson A (2014) Improved contact predictions using the recognition of protein like contact patterns. PLoS Comput Biol 10:e1003889. https://doi.org/10.1371/journal.pcbi.1003889
Michel M, Skwark MJ, Menéndez Hurtado D et al (2017) Predicting accurate contacts in thousands of Pfam domain families using PconsC3. Bioinformatics 33:2859–2866. https://doi.org/10.1093/bioinformatics/btx332
Ovchinnikov S, Park H, Varghese N et al (2017) Protein structure determination using metagenome sequence data. Science (80) 355:294–298. https://doi.org/10.1126/science.aah4043
Kim DE, Dimaio F, Yu-Ruei Wang R et al (2014) One contact for every twelve residues allows robust and accurate topology-level protein structure modeling. Proteins 82(Suppl 2):208–218. https://doi.org/10.1002/prot.24374
Brunger AT (2007) Version 1.2 of the crystallography and NMR system. Nat Protoc 2:2728–2733. https://doi.org/10.1038/nprot.2007.406
Dominguez C, Boelens R, Bonvin AMJJ (2003) HADDOCK: a protein−protein docking approach based on biochemical or biophysical information. J Am Chem Soc 125:1731–1737. https://doi.org/10.1021/ja026939x
Sirovetz BJ, Schafer NP, Wolynes PG Protein structure prediction: making AWSEM AWSEM-ER by adding evolutionary restraints. Proteins 85:2127–2142. https://doi.org/10.1002/prot.25367
Davtyan A, Schafer NP, Zheng W et al (2012) AWSEM-MD: protein structure prediction using coarse-grained physical potentials and bioinformatically based local structure biasing. J Phys Chem B 116:8494–8503. https://doi.org/10.1021/jp212541y
Noel JK, Whitford PC, Sanbonmatsu KY, Onuchic JN (2010) SMOG@ctbp: simplified deployment of structure-based models in GROMACS. Nucleic Acids Res 38:W657–W661. https://doi.org/10.1093/nar/gkq498
Noel JK, Levi M, Raghunathan M et al (2016) SMOG 2: a versatile software package for generating structure-based models. PLoS Comput Biol 12:e1004794. https://doi.org/10.1371/journal.pcbi.1004794
Kamisetty H, Ovchinnikow S, Baker D (2013) Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc Natl Acad Sci 110:15674–15679. https://doi.org/10.1073/pnas.1319550110
Morcos F, Schafer NP, Cheng RR et al (2014) Coevolutionary information, protein folding landscapes, and the thermodynamics of natural selection. Proc Natl Acad Sci 111:12408–12413. https://doi.org/10.1073/pnas.1413575111
Toth-Petroczy A, Palmedo P, Ingraham J et al (2016) Structured states of disordered proteins from genomic sequences. Cell 167:158–170.e12. https://doi.org/10.1016/j.cell.2016.09.010
Feinauer C, Szurmant H, Weigt M, Pagnani A (2016) Inter-protein sequence co-evolution predicts known physical interactions in bacterial ribosomes and the Trp operon. PLoS One 11:e0149166. https://doi.org/10.1371/journal.pone.0149166
Bitbol A-F, Dwyer RS, Colwell LJ, Wingreen NS (2016) Inferring interaction partners from protein sequences. bioRxiv 2016, 050732. https://doi.org/10.1101/050732
Malinverni D, Marsili S, Barducci A, De Los Rios P (2015) Large-scale conformational transitions and dimerization are encoded in the amino-acid sequences of Hsp70 chaperones. PLoS Comput Biol 11:e1004262. https://doi.org/10.1371/journal.pcbi.1004262
Schug A, Weigt M, Onuchic JN et al (2009) High-resolution protein complexes from integrating genomic information with molecular simulation. Proc Natl Acad Sci U S A 106:22124–22129. https://doi.org/10.1073/pnas.0912100106
dos Santos RN, Khan S, Morcos F (2018) Characterization of C-ring component assembly in flagellar motors from amino acid coevolution. R Soc Open Sci 5. https://doi.org/10.1098/rsos.171854
Pandini A, Morcos F, Khan S (2016) The gearbox of the bacterial flagellar motor switch. Structure 24:1209–1220. https://doi.org/10.1016/j.str.2016.05.012
Sfriso P, Duran-Frigola M, Mosca R et al (2016) Residues coevolution guides the systematic identification of alternative functional conformations in proteins. Structure 24:116–126. https://doi.org/10.1016/j.str.2015.10.025
Shamsi Z, Moffett AS, Shukla D (2017) Enhanced unbiased sampling of protein dynamics using evolutionary coupling information. Sci Rep 7:1–13. https://doi.org/10.1038/s41598-017-12874-7
Feng J, Shukla D (2018) Characterizing conformational dynamics of proteins using evolutionary couplings. J Phys Chem B 122:1017–1025. https://doi.org/10.1021/acs.jpcb.7b07529
Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39:29–37. https://doi.org/10.1093/nar/gkr367
Finn RD, Mistry J, Tate J et al (2010) The Pfam protein families database. Nucleic Acids Res 38:D211–D222. https://doi.org/10.1093/nar/gkp985
Anishchenko I, Ovchinnikov S, Kamisetty H, Baker D (2017) Origins of coevolution between residues distant in protein 3D structures. Proc Natl Acad Sci 114:9122–9127. https://doi.org/10.1073/pnas.1702664114
Acknowledgements
The authors thank Paolo De Los Rios, Faruck Morcos, Elijah Irvine, Rémy Bailly and Camille Elleaume for their critical reading of this manuscript. Duccio Malinverni acknowledges the support of the National Science foundation under grants 2012_149278 and 20020_163042/1. Alessandro Barducci acknowledges the support of the Agence Nationale de Recherche (ANR) under grant ANR-14-ACHN-0016.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Malinverni, D., Barducci, A. (2019). Coevolutionary Analysis of Protein Sequences for Molecular Modeling. In: Bonomi, M., Camilloni, C. (eds) Biomolecular Simulations. Methods in Molecular Biology, vol 2022. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9608-7_16
Download citation
DOI: https://doi.org/10.1007/978-1-4939-9608-7_16
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-4939-9607-0
Online ISBN: 978-1-4939-9608-7
eBook Packages: Springer Protocols