Abstract
For highly divergent sequences, there is often insufficient information to reliably construct alignments and phylogenetic trees. Since protein structure may be strongly conserved despite large divergences in sequence, structural information can be used to help identify homology in such cases.
While there exist well-studied models of sequence evolution, structurally informed alignment methods have typically made use of geometric measures of deviation that do not take into account the underlying mutational processes. In order to integrate structural information into sequence-based evolutionary models, we recently developed a stochastic model of structural evolution on a phylogenetic tree and implemented this as the StructAlign plugin for the StatAlign statistical alignment package.
In this chapter, we will outline the types of analyses that can be carried out using StructAlign, illustrating how the inclusion of structural information can be used to inform joint estimation of alignments and trees. StructAlign can also be used to infer branch-specific rates of structural evolution, and analysis of an example globin dataset highlights strong variation in the inferred rate across the tree. While structure is more highly conserved within clades, the rate of structural divergence as a function of sequence variation is larger between functionally divergent proteins. Allowing for the rate of structural divergence to vary over the tree results in an improved fit to the empirically observed pairwise RMSD values.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Change history
22 December 2018
The published version of this book included errors in code listings in Chapter 10. These code listings have been corrected and text has been updated.
References
Godzik A (1996) The structural alignment between two proteins: is there a unique answer? Protein Sci 5:1325–1338
Sela I, Ashkenazy H, Katoh K, Pupko T (2015) GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Res 43:W7–W14
Morrison DA, Ellis JT (1997) Effects of nucleotide sequence alignment on phylogeny estimation: a case study of 18S rDNAs of apicomplexa. Mol Biol Evol 14:428–441
Ogden TH, Rosenberg MS (2006) Multiple sequence alignment accuracy and phylogenetic inference. Syst Biol 55:314–328
Wong KM, Suchard MA, Huelsenbeck JP (2008) Alignment uncertainty and genomic analysis. Science 319:473–476
Lunter G, Rocco A, Mimouni N, Heger A, Caldeira A, Hein J (2008) Uncertainty in homology inferences: assessing and improving genomic sequence alignment. Genome Res 18:298–309
Herman JL, Novák Á, Lyngsø R, Szabó A, Miklós I, Hein J (2015) Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs. BMC Bioinformatics 16:108
Nelesen S, Liu K, Zhao D, Linder CR, Warnow T (2008) The effect of the guide tree on multiple sequence alignments and subsequent phylogenetic analyses. In: Proceedings of the 2008 Pacific Symposium on Biocomputing. World Scientific. p 25–36
Lunter G, Drummond AJ, Miklós I, Hein J (2005) Statistical alignment: recent progress, new applications, and challenges. In: Statistical Methods in Molecular Evolution. Statistics for Biology and Health. Springer, New York, NY
Redelings BD, Suchard MA (2005) Joint Bayesian estimation of alignment and phylogeny. Syst Biol 54:401–418
Westesson O, Lunter G, Paten B, Holmes I (2012) Accurate reconstruction of insertion-deletion histories by statistical phylogenetics. PLoS One 7:e34572
Holmes IH (2017) Historian: accurate reconstruction of ancestral sequences and evolutionary rates. Bioinformatics 33:1227–1229
Redelings BD (2014) Erasing errors due to alignment ambiguity when estimating positive selection. Mol Biol Evol 31:1979–1993
Satija R, Pachter L, Hein J (2008) Combining statistical alignment and phylogenetic footprinting to detect regulatory elements. Bioinformatics 24:1236–1242
Satija R, Novák Á, Miklós I, Lyngsø R, Hein J (2009) BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC. BMC Evol Biol 9:217
Philippe H, Brinkmann H, Lavrov DV, Littlewood DTJ, Manuel M, Wörheide G, Baurain D (2011) Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biol 9:e1000602
Kumar S, Filipski AJ, Battistuzzi FU, Kosakovsky Pond SL, Tamura K (2012) Statistics and truth in phylogenomics. Mol Biol Evol 29:457–472
Talavera G, Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56:564–577
Wu M, Chatterji S, Eisen JA (2012) Accounting for alignment uncertainty in phylogenomics. PLoS One 7:e30288
Gatesy J, DeSalle R, Wheeler W (1993) Alignment-ambiguous nucleotide sites and the exclusion of systematic data. Mol Phylogenet Evol 2:152–157
Lee MS (2001) Unalignable sequences and molecular evolution. Trends Ecol Evol 16:681–685
Löytynoja A, Goldman N (2008) Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320:1632–1635
Hasegawa H, Holm L (2009) Advances and pitfalls of protein structural alignment. Curr Opin Struct Biol 19:341–348
Johnson MS, Šali A, Blundell TL (1990) Phylogenetic relationships from three-dimensional protein structures. Methods Enzymol 183:670–690
Bujnicki JM (2000) Phylogeny of the restriction endonuclease-like superfamily inferred from comparison of protein structures. J Mol Evol 50:39–44
Lundin D, Poole AM, Sjöberg B-M, Högbom M (2012) Use of structural phylogenetic networks for classification of the ferritin-like superfamily. J Biol Chem 287:20565–20575
Chothia C, Lesk AM (1986) The relation between the divergence of sequence and structure in proteins. EMBO J 5:823
Panchenko AR, Wolf YI, Panchenko LA, Madej T (2005) Evolutionary plasticity of protein families: coupling between sequence and structure variation. Proteins 61:535–544
Illergård K, Ardell DH, Elofsson A (2009) Structure is three to ten times more conserved than sequence: a study of structural response in protein cores. Proteins 77:499–508
Echave J, Spielman SJ, Wilke CO (2016) Causes of evolutionary rate variation among protein sites. Nat Rev Genet 17:109–121
Worth CL, Gong S, Blundell TL (2009) Structural and functional constraints in the evolution of protein families. Nat Rev Mol Cell Biol 10:709–720
Gilson AI, Marshall-Christensen A, Choi J-M, Shakhnovich EI (2017) The role of evolutionary selection in the dynamics of protein structure evolution. Biophys J 112:1350–1365
Choi SC, Hobolth A, Robinson DM, Kishino H, Thorne JL (2007) Quantifying the impact of protein tertiary structure on molecular evolution. Mol Biol Evol 24:1769–1782
Kleinman CL, Rodrigue N, Lartillot N, Philippe H (2010) Statistical potentials for improved structurally constrained evolutionary models. Mol Biol Evol 27:1546–1560
Rodrigue N, Philippe H, Lartillot N (2006) Assessing site-interdependent phylogenetic models of sequence evolution. Mol Biol Evol 23:1762–1775
Sadowski M, Taylor W (2010) On the evolutionary origins of “fold space continuity”: a study of topological convergence and divergence in mixed alpha-beta domains. J Struct Biol 172:244–252
Rackovsky S (2015) Nonlinearities in protein space limit the utility of informatics in protein biophysics. Proteins 83:1923–1928
Sadreyev RI, Kim B-H, Grishin NV (2009) Discrete–continuous duality of protein structure space. Curr Opin Struct Biol 19:321–328
Holzgräfe C, Wallin S (2014) Smooth functional transition along a mutational pathway with an abrupt protein fold switch. Biophys J 107:1217–1225
Challis CJ, Schmidler SC (2012) A stochastic evolutionary model for protein structure alignment and phylogeny. Mol Biol Evol 29:3575–3587
Herman JL, Challis CJ, Novák Á, Hein J, Schmidler SC (2014) Simultaneous Bayesian estimation of alignment and phylogeny under a joint model of protein sequence and structure. Mol Biol Evol 31:2251–2266
Novák Á, Miklós I, Lyngsø R, Hein J (2008) StatAlign: an extendable software package for joint Bayesian estimation of alignments and evolutionary trees. Bioinformatics 24:2403–2404
Burmester T, Ebner B, Weich B, Hankeln T (2002) Cytoglobin: a novel globin type ubiquitously expressed invertebrate tissues. Mol Biol Evol 19:416–421
de Sanctis D, Dewilde S, Pesce A, Moens L, Ascenzi P, Hankeln T, Burmester T, Bolognesi M (2004) Crystal structure of cytoglobin: the fourth globin type discovered in man displays heme hexa-coordination. J Mol Biol 336:917–927
Hoffmann FG, Opazo JC, Storz JF (2010) Gene cooption and convergent evolution of oxygen transport hemoglobins in jawed and jawless vertebrates. Proc Natl Acad Sci U S A 107:14274–14279
Hoffmann FG, Opazo JC, Storz JF (2011) Differential loss and retention of cytoglobin, myoglobin, and globin-e during the radiation of vertebrates. Genome Biol Evol 3:588–600
Hoffmann FG, Opazo JC, Hoogewijs D, Hankeln T, Ebner B, Vinogradov SN, Bailly X, Storz JF (2012) Evolution of the globin gene family in deuterostomes: lineage-specific patterns of diversification and attrition. Mol Biol Evol 29:1735–1745
Geyer C (2011) Importance sampling, simulated tempering, and umbrella sampling. In: Brooks S, Gelman A, Jones G, Meng X (eds) Handbook of Markov Chain Monte Carlo. Chapman & Hall/CRC, Boca Raton, pp 295–311
Altekar G, Dwarkadas S, Huelsenbeck JP, Ronquist F (2004) Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics 20:407–415
Thorne JL, Kishino H, Felsenstein J (1992) Inching toward reality: an improved likelihood model of sequence evolution. J Mol Evol 34:3–16
Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7:457–472
Humphrey W, Dalke A, Schulten K (1996) VMD: visual molecular dynamics. J Mol Graph 14:33–38
Hoy JA, Robinson H, Trent JT, Kakar S, Smagghe BJ, Hargrove MS (2007) Plant hemoglobins: a molecular fossil record for the evolution of oxygen transport. J Mol Biol 371:168–179
Lobanov M, Bogatyreva N, Galzitskaia O (2008) Radius of gyration is indicator of compactness of protein structure. Mol Biol 42:701–706
Christensen AB, Herman JL, Elphick MR, Kober KM, Janies D, Linchangco G, Semmens DC, Bailly X, Vinogradov SN, Hoogewijs D (2015) Phylogeny of echinoderm hemoglobins. PLoS One 10:e0129668
Gupta KJ, Hebelstrup KH, Mur LA, Igamberdiev AU (2011) Plant hemoglobins: important players at the crossroads between oxygen and nitric oxide. FEBS Lett 585:3843–3849
Hargrove MS, Brucker EA, Stec B, Sarath G, Arredondo-Peter R, Klucas RV, Olson JS, Phillips GN (2000) Crystal structure of a nonsymbiotic plant hemoglobin. Structure 8:1005–1014
Sharir-Ivry A, **a Y (2017) The impact of native state switching on protein sequence evolution. Mol Biol Evol 34:1378–1390
Maadooliat M, Zhou L, Najibi SM, Gao X, Huang JZ (2016) Collective estimation of multiple bivariate density functions with application to angular-sampling-based protein loop modeling. J Am Stat Assoc 111:43–56
Golden M, García-Portugués E, Sørensen M, Mardia KV, Hamelryck T, Hein J (2017) A generative angular model of protein structure evolution. Mol Biol Evol 34:2085–2100
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Herman, J.L. (2019). Enhancing Statistical Multiple Sequence Alignment and Tree Inference Using Structural Information. In: Sikosek, T. (eds) Computational Methods in Protein Evolution. Methods in Molecular Biology, vol 1851. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8736-8_10
Download citation
DOI: https://doi.org/10.1007/978-1-4939-8736-8_10
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8735-1
Online ISBN: 978-1-4939-8736-8
eBook Packages: Springer Protocols