Abstract
Phylogenetic inference from protein data is traditionally based on empirical substitution models of evolution that assume that protein sites evolve independently of each other and under the same substitution process. However, it is well known that the structural properties of a protein site in the native state affect its evolution, in particular the sequence entropy and the substitution rate. Starting from the seminal proposal by Halpern and Bruno, where structural properties are incorporated in the evolutionary model through site-specific amino acid frequencies, several models have been developed to tackle the influence of protein structure on sequence evolution. Here we describe stability-constrained substitution (SCS) models that explicitly consider the stability of the native state against both unfolded and misfolded states. One of them, the mean-field model, provides an independent sites approximation that can be readily incorporated in maximum likelihood methods of phylogenetic inference, including ancestral sequence reconstruction. Next, we describe its validation with simulated and real proteins and its limitations and advantages with respect to empirical models that lack site specificity. We finally provide guidelines and recommendations to analyze protein data accounting for stability constraints, including computer simulations and inferences of protein evolution based on maximum likelihood. Some practical examples are included to illustrate these procedures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Schmitt AO, Schuchhardt J, Ludwig A, Brockmann GA (2007) Protein evolution within and between species. J Theor Biol 249(2):376–383. https://doi.org/10.1016/j.jtbi.2007.08.001
Gao F, Bhattacharya T, Gaschen B, Taylor J, Moore JP, Novitsky V, Yusim K, Lang D, Foley B, Beddows S, Alam M, Haynes B, Hahn BH, Korber B (2003) Consensus and ancestral state HIV vaccines. Science 299(5612):1515–1518
Arenas M, Posada D (2010) Computational design of centralized HIV-1 genes. Curr HIV Res 8(8):613–621
Wilson C, Agafonov RV, Hoemberger M, Kutter S, Zorba A, Halpin J, Buosi V, Otten R, Waterman D, Theobald DL, Kern D (2015) Kinase dynamics. Using ancient protein kinases to unravel a modern cancer drug’s mechanism. Science 347(6224):882–886. https://doi.org/10.1126/science.aaa1823
Perez-Jimenez R, Ingles-Prieto A, Zhao ZM, Sanchez-Romero I, Alegre-Cebollada J, Kosuri P, Garcia-Manyes S, Kappock TJ, Tanokura M, Holmgren A, Sanchez-Ruiz JM, Gaucher EA, Fernandez JM (2011) Single-molecule paleoenzymology probes the chemistry of resurrected enzymes. Nat Struct Mol Biol 18(5):592–596
Wijma HJ, Floor RJ, Janssen DB (2013) Structure- and sequence-analysis inspired engineering of proteins for enhanced thermostability. Curr Opin Struct Biol 23(4):588–594. https://doi.org/10.1016/j.sbi.2013.04.008
Cole MF, Gaucher EA (2011) Utilizing natural diversity to evolve protein function: applications towards thermostability. Curr Opin Chem Biol 15(3):399–406. https://doi.org/10.1016/j.cbpa.2011.03.005
Arenas M (2015) Trends in substitution models of molecular evolution. Front Genet 6:319. https://doi.org/10.3389/fgene.2015.00319
Liberles DA, Teichmann SA, Bahar I, Bastolla U, Bloom J, Bornberg-Bauer E, Colwell LJ, de Koning AP, Dokholyan NV, Echave J, Elofsson A, Gerloff DL, Goldstein RA, Grahnen JA, Holder MT, Lakner C, Lartillot N, Lovell SC, Naylor G, Perica T, Pollock DD, Pupko T, Regan L, Roger A, Rubinstein N, Shakhnovich E, Sjolander K, Sunyaev S, Teufel AI, Thorne JL, Thornton JW, Weinreich DM, Whelan S (2012) The interface of protein structure, protein biophysics, and molecular evolution. Protein Sci 21(6):769–785
Bastolla U (2014) Detecting selection on protein stability through statistical mechanical models of folding and evolution. Biomol Ther 4:291–314
Wilke CO (2012) Bringing molecules back into molecular evolution. PLoS Comput Biol 8(6):e1002572
Sikosek T, Chan HS (2014) Biophysics of protein evolution and evolutionary protein biophysics. J R Soc Interface 11(100):20140419. https://doi.org/10.1098/rsif.2014.0419
Goldstein RA (2011) The evolution and evolutionary consequences of marginal thermostability in proteins. Proteins 79(5):1396–1407
Serohijos AW, Shakhnovich EI (2014) Merging molecular mechanism and evolution: theory and computation at the interface of biophysics and evolutionary population genetics. Curr Opin Struct Biol 26:84–91. https://doi.org/10.1016/j.sbi.2014.05.005
Bastolla U, Dehouck Y, Echave J (2017) What evolution tells us about protein physics, and protein physics tells us about evolution. Curr Opin Struct Biol 42:59–66. https://doi.org/10.1016/j.sbi.2016.10.020
Echave J (2008) Evolutionary divergence of protein structure: the linearly forced elastic network model. Chem Phys Lett 457(4):413–416. https://doi.org/10.1016/j.cplett.2008.04.042
Tirion MM (1996) Large amplitude elastic motions in proteins from a single-parameter, atomic analysis. Phys Rev Lett 77(9):1905–1908
Bahar I, Rader AJ (2005) Coarse-grained normal mode analysis in structural biology. Curr Opin Struct Biol 15(5):586–592. https://doi.org/10.1016/j.sbi.2005.08.007
Bornberg-Bauer E, Chan HS (1999) Modeling evolutionary landscapes: mutational stability, topology, and superfunnels in sequence space. Proc Natl Acad Sci U S A 96(19):10689–10694
Bastolla U, Porto M, Eduardo Roman MH, Vendruscolo MH (2003) Connectivity of neutral networks, overdispersion, and structural conservation in protein evolution. J Mol Evol 56(3):243–254
Lemmon AR, Moriarty EC (2004) The importance of proper model assumption in bayesian phylogenetics. Syst Biol 53(2):265–277
Zhang J (1999) Performance of likelihood ratio tests of evolutionary hypotheses under inadequate substitution models. Mol Biol Evol 16(6):868–875
Bordner AJ, Mittelmann HD (2013) A new formulation of protein evolutionary models that account for structural constraints. Mol Biol Evol 31(3):736–749
Rodrigue N, Lartillot N, Bryant D, Philippe H (2005) Site interdependence attributed to tertiary structure in amino acid sequence evolution. Gene 347(2):207–217
Arenas M, Sanchez-Cobos A, Bastolla U (2015) Maximum likelihood phylogenetic inference with selection on protein folding stability. Mol Biol Evol 32(8):2195–2207. https://doi.org/10.1093/molbev/msv085
Bastolla U, Porto M, Roman HE, Vendruscolo M (2006) A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank. BMC Evol Biol 6:43
Anishchenko I, Ovchinnikov S, Kamisetty H, Baker D (2017) Origins of coevolution between residues distant in protein 3D structures. Proc Natl Acad Sci U S A 114:9122–9127. https://doi.org/10.1073/pnas.1702664114
Wang ZO, Pollock DD (2005) Context dependence and coevolution among amino acid residues in proteins. Methods Enzymol 395:779–790. https://doi.org/10.1016/S0076-6879(05)95040-4
Arenas M, Dos Santos HG, Posada D, Bastolla U (2013) Protein evolution along phylogenetic histories under structurally constrained substitution models. Bioinformatics 29(23):3020–3028
Echave J, Wilke CO (2017) Biophysical models of protein evolution: understanding the patterns of evolutionary sequence divergence. Annu Rev Biophys 46:85–103. https://doi.org/10.1146/annurev-biophys-070816-033819
Bastolla U, Farwer J, Knapp EW, Vendruscolo M (2001) How to guarantee optimal stability for most representative structures in the Protein Data Bank. Proteins 44(2):79–96
Minning J, Porto M, Bastolla U (2013) Detecting selection for negative design in proteins through an improved model of the misfolded state. Proteins 81(7):1102–1112. https://doi.org/10.1002/prot.24244
Sella G, Hirsh AE (2005) The application of statistical physics to evolutionary biology. Proc Natl Acad Sci U S A 102(27):9541–9546
Mustonen V, Lassig M (2005) Evolutionary population genetics of promoters: predicting binding sites and functional phylogenies. Proc Natl Acad Sci U S A 102(44):15936–15941. https://doi.org/10.1073/pnas.0505537102
Arenas M (2012) Simulation of molecular data under diverse evolutionary scenarios. PLoS Comput Biol 8(5):e1002495
Hoban S, Bertorelle G, Gaggiotti OE (2012) Computer simulations: tools for population and evolutionary genetics. Nat Rev Genet 13(2):110–122
Kingman JFC (1982) The coalescent. Stoch Process Appl 13:235–248
Posada D, Wiuf C (2003) Simulating haplotype blocks in the human genome. Bioinformatics 19(2):289–290
Arenas M, Posada D (2010) Coalescent simulation of intracodon recombination. Genetics 184(2):429–437
Arenas M (2013) Computer programs and methodologies for the simulation of DNA sequence data with recombination. Front Genet 4:9
Arenas M, Posada D (2014) Simulation of genome-wide evolution under heterogeneous substitution models and complex multispecies coalescent histories. Mol Biol Evol 31(5):1295–1301
Hudson RR (1998) Island models and the coalescent process. Mol Ecol 7(4):413–418
Yang Z (2006) Computational molecular evolution. Oxford University Press, Oxford
Abascal F, Zardoya R, Posada D (2005) ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21(9):2104–2105
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A (2000) Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomol Struct 29:291–325
Halpern AL, Bruno WJ (1998) Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies. Mol Biol Evol 15(7):910–917
Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18(5):691–699
Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8(3):275–282
Arenas M, Weber CC, Liberles DA, Bastolla U (2017) ProtASR: an evolutionary framework for ancestral protein reconstruction with selection on folding stability. Syst Biol 66:1054–1064. https://doi.org/10.1093/sysbio/syw121
Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24(8):1586–1591
Yang Z (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13(5):555–556
Merkl R, Sterner R (2016) Ancestral protein reconstruction: techniques and applications. Biol Chem 397(1):1–21. https://doi.org/10.1515/hsz-2015-0158
Liberles DA (2007) Ancestral sequence reconstruction. Oxford University Press, Oxford
Kothe DL, Li Y, Decker JM, Bibollet-Ruche F, Zammit KP, Salazar MG, Chen Y, Weng Z, Weaver EA, Gao F, Haynes BF, Shaw GM, Korber BT, Hahn BH (2006) Ancestral and consensus envelope immunogens for HIV-1 subtype C. Virology 352(2):438–449
Gaucher EA, Govindarajan S, Ganesh OK (2008) Palaeotemperature trend for Precambrian life inferred from resurrected proteins. Nature 451(7179):704–707
Hobbs JK, Shepherd C, Saul DJ, Demetras NJ, Haaning S, Monk CR, Daniel RM, Arcus VL (2012) On the origin and evolution of thermophily: reconstruction of functional precambrian enzymes from ancestors of Bacillus. Mol Biol Evol 29(2):825–835. https://doi.org/10.1093/molbev/msr253
Bastolla U, Moya A, Viguera E, van Ham RC (2004) Genomic determinants of protein folding thermodynamics in prokaryotic organisms. J Mol Biol 343(5):1451–1466
Williams PD, Pollock DD, Blackburne BP, Goldstein RA (2006) Assessing the accuracy of ancestral protein reconstruction methods. PLoS Comput Biol 2(6):e69
Lartillot N, Lepage T, Blanquart S (2009) PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 25(17):2286–2288. https://doi.org/10.1093/bioinformatics/btp368
Lartillot N, Philippe H (2004) A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol 21(6):1095–1109
Mustonen V, Lassig M (2009) From fitness landscapes to seascapes: non-equilibrium dynamics of selection and adaptation. Trends Genet 25(3):111–119. https://doi.org/10.1016/j.tig.2009.01.002
Arenas M, Patricio M, Posada D, Valiente G (2010) Characterization of phylogenetic networks with NetTest. BMC Bioinformatics 11(1):268
Acknowledgments
M.A. was supported by the grant “Ramón y Cajal” RYC-2015-18241 from the Spanish Government. U.B. is supported by the grant BIO2016-79043 from the Spanish Ministry of Economy.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Bastolla, U., Arenas, M. (2019). The Influence of Protein Stability on Sequence Evolution: Applications to Phylogenetic Inference. In: Sikosek, T. (eds) Computational Methods in Protein Evolution. Methods in Molecular Biology, vol 1851. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8736-8_11
Download citation
DOI: https://doi.org/10.1007/978-1-4939-8736-8_11
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8735-1
Online ISBN: 978-1-4939-8736-8
eBook Packages: Springer Protocols