Abstract
Present-day protein space is the result of 3.7 billion years of evolution, constrained by the underlying physicochemical qualities of the proteins. It is difficult to differentiate between evolutionary traces and effects of physicochemical constraints. Nonetheless, as a rule of thumb, instances of structural reuse, or focusing on structural similarity, are likely attributable to physicochemical constraints, whereas sequence reuse, or focusing on sequence similarity, may be more indicative of evolutionary relationships. Both types of relationships have been studied and can provide meaningful insights to protein biophysics and evolution, which in turn can lead to better algorithms for protein search, annotation, and maybe even design.
In broad strokes, studies of protein space vary in the entities they represent, the similarity measure comparing these entities, and the representation used. The entities can be, for example, protein chains, domains, supra-domains, or smaller protein sub-parts denoted themes. The measures of similarity between the entities can be based on sequence, structure, function, or any combination of these. The representation can be global, encompassing the whole space, or local, focusing on a particular region surrounding protein(s) of interest. Global representations include lists of grouped proteins, protein networks, and maps. Networks are the abstraction that is derived most directly from the similarity data: each node is the protein entity (e.g., a domain), and edges connect similar domains. Selecting the entities, the similarity measure, and the abstraction are three intertwined decisions: the similarity measures allow us to identify the entities, and the selection of entities influences what is a meaningful similarity measure. Similarly, we seek entities that are related to each other in a way, for which a simple representation describes their relationships succinctly and accurately. This chapter will cover studies that rely on different entities, similarity measures, and a range of representations to better understand protein structure space. Scholars may use publicly available navigators offering a global representation, and in particular the hierarchical classifications SCOP, CATH, and ECOD, or a local representation, which encompass structural alignment algorithms. Alternatively, scholars can configure their own navigator using existing tools. To demonstrate this DIY (do it yourself) approach for navigating in protein space, we investigate substrate-binding proteins. By presenting sequence similarities among this large and diverse protein family as a network, we can infer that one member (pdb ID 4ntl; of yet unknown function) may bind methionine and suggest a putative binding mechanism.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Notice that the terms used here characterize the similarity measure, not the style of navigation in protein space, to use the same terms as in the Needleman–Wunsch and Smith–Waterman sequence alignment algorithms.
References
Kolodny R, Pereyaslavets L, Samson AO, Levitt M (2012) On the universe of protein folds. Annu Rev Biophys 42:559. https://doi.org/10.1146/annurev-biophys-083012-130432
Ben-Tal N, Kolodny R (2014) Representation of the protein universe using classifications, maps, and networks. Israel J Chem 54:1286
Zeldovich KB, Shakhnovich EI (2008) Understanding protein evolution: from protein physics to Darwinian selection. Annu Rev Phys Chem 59:105–127
Trifonov EN, Berezovsky IN (2003) Evolutionary aspects of protein structure and folding. Curr Opin Struct Biol 13(1):110–114
Choi IG, Kim SH (2006) Evolution of protein structural classes and protein sequence families. Proc Natl Acad Sci U S A 103(38):14056–14061. https://doi.org/10.1073/pnas.0606239103
Dokholyan NV, Shakhnovich B, Shakhnovich EI (2002) Expanding protein universe and its origin from the biological big bang. Proc Natl Acad Sci 99(22):14132–14136. https://doi.org/10.1073/pnas.202497999
Alva V, Remmert M, Biegert A, Lupas AN, Söding J (2010) A galaxy of folds. Protein Sci 19(1):124–130. https://doi.org/10.1002/pro.297
Farías-Rico JA, Schmidt S, Höcker B (2014) Evolutionary relationship of two ancient protein superfolds. Nat Chem Biol 10(9):710–715. https://doi.org/10.1038/nchembio.1579 http://www.nature.com/nchembio/journal/v10/n9/abs/nchembio.1579.html#supplementary-information
Nepomnyachiy S, Ben-Tal N, Kolodny R (2017) Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengths. Proc Natl Acad Sci U S A 114:11703
Skolnick J, Arakaki AK, Lee SY, Brylinski M (2009) The continuity of protein structure space is an intrinsic property of proteins. Proc Natl Acad Sci 106:15690. https://doi.org/10.1073/pnas.0907683106
Nepomnyachiy S, Ben-Tal N, Kolodny R (2014) Global view of the protein universe. Proc Natl Acad Sci 111:11691. https://doi.org/10.1073/pnas.1403395111
Mackenzie CO, Zhou J, Grigoryan G (2016) Tertiary alphabet for the observable protein structural universe. Proc Natl Acad Sci U S A 113(47):E7438–E7447
Kolodny R, Petrey D, Honig B (2006) Protein structure comparison: implications for the nature of ‘fold space’, and structure and function prediction. Curr Opin Struct Biol 16(3):393–398
Osadchy M, Kolodny R (2011) Maps of protein structure space reveal a fundamental relationship between protein structure and function. Proc Natl Acad Sci 108(30):12301–12306. https://doi.org/10.1073/pnas.1102727108
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The Protein Data Bank. Nucleic Acids Res 28(1):235–242
Koehl P (2006) Protein structure classification. In: Reviews in Computational Chemistry. John Wiley & Sons, Inc., New York, pp 1–55. https://doi.org/10.1002/0471780367.ch1
Ponting CP, Russell RR (2002) The natural history of protein domains. Annu Rev Biophys Biomol Struct 31(1):45–71. https://doi.org/10.1146/annurev.biophys.31.082901.134314
Vogel C, Berzuini C, Bashton M, Gough J, Teichmann SA (2004) Supra-domains: evolutionary units larger than single protein domains. J Mol Biol 336(3):809–823. https://doi.org/10.1016/j.jmb.2003.12.026
Kolodny R, Koehl P, Guibas L, Levitt M (2002) Small libraries of protein fragments model native protein structures accurately. J Mol Biol 323(2):297–307
Vanhee P, Verschueren E, Baeten L, Stricher F, Serrano L, Rousseau F, Schymkowitz J (2011) BriX: a database of protein building blocks for structural analysis, modeling and design. Nucleic Acids Res 39(Suppl 1):D435–D442
Davis FP, Sali A (2005) PIBASE: a comprehensive database of structurally defined protein interfaces. Bioinformatics 21(9):1901–1907
Vanhee P, Reumers J, Stricher F, Baeten L, Serrano L, Schymkowitz J, Rousseau F (2009) PepX: a structural database of non-redundant protein–peptide complexes. Nucleic Acids Res 38(Suppl 1):D545–D551
Fernandez-Fuentes N, Dybas JM, Fiser A (2010) Structural characteristics of novel protein folds. PLoS Comput Biol 6(4):e1000750
Ovchinnikov S, Park H, Varghese N, Huang P-S, Pavlopoulos GA, Kim DE, Kamisetty H, Kyrpides NC, Baker D (2017) Protein structure determination using metagenome sequence data. Science 355(6322):294–298
Pieper U, Eswar N, Davis FP, Braberg H, Madhusudhan MS, Rossi A, Marti-Renom M, Karchin R, Webb BM, Eramian D (2006) MODBASE: a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res 34(Suppl 1):D291–D295
Lo Conte L, Ailey B, Hubbard TJP, Brenner SE, Murzin AG, Chothia C (2000) SCOP: a structural classification of proteins database. Nucleic Acids Res 28(1):257–259
Orengo C, Michie A, Jones S, Jones D, Swindells M, Thornton J (1997) CATH-a hierarchic classification of protein domain structures. Structure 5(8):1093–1108
Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim B-H, Grishin NV (2014) ECOD: an evolutionary classification of protein domains. PLoS Comput Biol 10(12):e1003926. https://doi.org/10.1371/journal.pcbi.1003926
Lupas AN, Ponting CP, Russell RB (2001) On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world? J Struct Biol 134(2–3):191–203
Soding J (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21(7):951–960
Eddy SR (2009) A new generation of homology search tools based on probabilistic inference. Genome Inform 1:205–211
Alva V, Söding J, Lupas AN (2016) A vocabulary of ancient peptides at the origin of folded proteins. elife 4:e09410
Kosloff M, Kolodny R (2008) Sequence-similar, structure-dissimilar protein pairs in the PDB. Proteins 71(2):891–902
Narunsky A, Nepomnyachiy S, Ashkenazy H, Kolodny R, Ben-Tal N (2015) ConTemplate suggests possible alternative conformations for a query protein of known structure. Structure 23(11):2162–2170
Holm L, Sander C (1996) Map** the protein universe. Science 273(5275):595–603
Skolnick J, Gao M, Zhou H (2014) On the role of physics and evolution in dictating protein structure and function. Israel J Chem 54(8–9):1176–1188
Hasegawa H, Holm L (2009) Advances and pitfalls of protein structural alignment. Curr Opin Struct Biol 19(3):341–348
Kolodny R, Koehl P, Levitt M (2005) Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Mol Biol 346(4):1173–1188
Kolodny R, Linial N (2004) Approximate protein structural alignment in polynomial time. Proc Natl Acad Sci U S A 101(33):12201–12206
Carugo O (2007) Recent progress in measuring structural similarity between proteins. Curr Protein Pept Sci 8(3):241
Yanover C, Vanetik N, Levitt M, Kolodny R, Keasar C (2014) Redundancy-weighting for better inference of protein structural features. Bioinformatics 30(16):2295–2301
Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658–1659
Wang G, Dunbrack RL (2003) PISCES: a protein sequence culling server. Bioinformatics 19(12):1589–1591. https://doi.org/10.1093/bioinformatics/btg224
Choi I-G, Kim S-H (2007) Global extent of horizontal gene transfer. Proc Natl Acad Sci 104(11):4489–4494. https://doi.org/10.1073/pnas.0611557104
Orengo CA, Flores TP, Taylor WR, Thornton JM (1993) Identification and classification of protein fold families. Protein Eng 6(5):485–500. https://doi.org/10.1093/protein/6.5.485
Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR (2014) Pfam: the protein families database. Nucleic Acids Res 42:D222. https://doi.org/10.1093/nar/gkt1223
Pearl FMG, Sillitoe I, Orengo CA (2015) Protein structure classification. In: eLS. John Wiley & Sons, Ltd., New York. https://doi.org/10.1002/9780470015902.a0003033.pub3
Levitt M, Chothia C (1976) Structural patterns in globular proteins. Nature 261(5561):552–558
Holland TA, Veretnik S, Shindyalov IN, Bourne PE (2006) Partitioning protein structures into domains: why is it so difficult? J Mol Biol 361(3):562–590
Hadley C, Jones DT (1999) A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. Structure 7(9):1099–1112
Day R, Beck DAC, Armen RS, Daggett V (2003) A consensus view of fold space: combining SCOP, CATH, and the Dali Domain Dictionary. Protein Sci 12(10):2150–2160. https://doi.org/10.1110/ps.0306803
Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR (2010) CDD: a conserved domain database for the functional annotation of proteins. Nucleic Acids Res 39(Suppl 1):D225–D229
Kelley LA, Sternberg MJ (2015) Partial protein domains: evolutionary insights and bioinformatics challenges. Genome Biol 16(1):1–3. https://doi.org/10.1186/s13059-015-0663-8
Veretnik S, Gu J, Wodak S (2009) Identifying structural domains in proteins. In: Gu G, Bourne P (eds) Structural bioinformatics, 2nd edn. Wiley-Blackwell, Hoboken, NJ, pp 485–513
Schaeffer RD, Jonsson AL, Simms AM, Daggett V (2011) Generation of a consensus protein domain dictionary. Bioinformatics 27(1):46–54. https://doi.org/10.1093/bioinformatics/btq625
Csaba G, Birzele F, Zimmer R (2009) Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis. BMC Struct Biol 9(1):23
Redfern OC, Harrison A, Dallman T, Pearl FM, Orengo CA (2007) CATHEDRAL: a fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures. PLoS Comput Biol 3(11):e232. https://doi.org/10.1371/journal.pcbi.0030232
Zhou H, Xue B, Zhou Y (2007) DDOMAIN: dividing structures into domains using a normalized domain–domain interaction profile. Protein Sci 16(5):947–955. https://doi.org/10.1110/ps.062597307
Alexandrov N, Shindyalov I (2003) PDP: protein domain parser. Bioinformatics 19(3):429–430. https://doi.org/10.1093/bioinformatics/btg006
Krishna SS, Grishin NV (2005) Structural drift: a possible path to protein fold change. Bioinformatics 21(8):1308–1310
Pascual-García A, Abia D, Ortiz ÁR, Bastolla U (2009) Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures. PLoS Comput Biol 5(3):e1000331. https://doi.org/10.1371/journal.pcbi.1000331
Edwards H, Deane CM (2015) Structural bridges through fold space. PLoS Comput Biol 11(9):e1004466
Fox NK, Brenner SE, Chandonia J-M (2014) SCOPe: structural classification of proteins—extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res 42(D1):D304–D309. https://doi.org/10.1093/nar/gkt1240
Andreeva A, Howorth D, Chothia C, Kulesha E, Murzin AG (2013) SCOP2 prototype: a new approach to protein structure mining. Nucleic Acids Res 42:D310. https://doi.org/10.1093/nar/gkt1242
Ellson J, Gansner E, Koutsofios L, North SC, Woodhull G (2001) Graphviz—open source graph drawing tools. In: International symposium on graph drawing. Springer, Heidelberg, pp 483–484
Prlić A, Bliven S, Rose PW, Bluhm WF, Bizon C, Godzik A, Bourne PE (2010) Pre-calculated protein structure alignments at the RCSB PDB website. Bioinformatics 26(23):2983–2985. https://doi.org/10.1093/bioinformatics/btq572
Krissinel E, Henrick K (2003) Protein structure comparison in 3D based on secondary structure matching (SSM) followed by C-alpha alignment, scored by a new structural similarity function. Proceedings of the 5th International Conference on Molecular Structural Biology, Vienna, vol. 88
Krissinel E, Henrick K (2004) Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D 60(Pt 12 Pt 1):2256–2268
Madej T, Lanczycki CJ, Zhang D, Thiessen PA, Geer RC, Marchler-Bauer A (2014) MMDB and VAST+: tracking structural similarities between macromolecular complexes. Nucleic Acids Res D42:D297. https://doi.org/10.1093/nar/gkt1208
Mezulis S, Sternberg MJE, Kelley LA (2016) PhyreStorm: a web server for fast structural searches against the PDB. J Mol Biol 428(4):702–708. https://doi.org/10.1016/j.jmb.2015.10.017
Zhang Y, Skolnick J (2005) TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33(7):2302–2309. https://doi.org/10.1093/nar/gki524
Wiederstein M, Gruber M, Frank K, Melo F, Sippl Manfred J (2014) Structure-based characterization of multiprotein complexes. Structure 22(7):1063–1070. https://doi.org/10.1016/j.str.2014.05.005
Berezovsky IN, Guarnera E, Zheng Z (2017) Basic units of protein structure, folding, and function. Prog Biophys Mol Biol 128:85–99. https://doi.org/10.1016/j.pbiomolbio.2016.09.009
Menke M, Berger B, Cowen L (2008) Matt: local flexibility aids protein multiple structure alignment. PLoS Comput Biol 4(1):e10
Shindyalov I, Bourne P (1998) Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng 11(9):739–747
Ortiz A, Strauss C, Olmea O (2002) MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci 11(11):2606–2621
Tung CH, Huang JW, Yang JM (2007) Kappa-alpha plot derived structural alphabet and BLOSUM-like substitution matrix for rapid search of protein structure database. Genome Biol 8(3):R31
Budowski-Tal I, Nov Y, Kolodny R (2010) FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately. Proc Natl Acad Sci U S A 107(8):3481–3486. https://doi.org/10.1073/pnas.0914097107
Petrey D, **ang Z, Tang CL, **e L, Gimpelev M, Mitros T, Soto CS, Goldsmith-Fischman S, Kernytsky A, Schlessinger A, Koh IY, Alexov E, Honig B (2003) Using multiple structure alignments, fast model building, and energetic analysis in fold recognition and homology modeling. Proteins 53(Suppl 6):430–435. https://doi.org/10.1002/prot.10550
Subbiah S, Laurents DV, Levitt M (1993) Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core. Curr Biol 3(3):141–148
Saito R, Smoot ME, Ono K, Ruscheinski J, Wang P-L, Lotia S, Pico AR, Bader GD, Ideker T (2012) A travel guide to Cytoscape plugins. Nat Methods 9(11):1069–1076
Nepomnyachiy S, Ben-Tal N, Kolodny R (2015) CyToStruct: augmenting the network visualization of cytoscape with the power of molecular viewers. Structure 23(5):941–948
Morris JH, Huang CC, Babbitt PC, Ferrin TE (2007) structureViz: linking Cytoscape and UCSF chimera. Bioinformatics 23(17):2345–2347. https://doi.org/10.1093/bioinformatics/btm329
Schrodinger, LLC (2010) The PyMOL molecular graphics system, Version 1.3r1. Schrodinger, LLC, New York
Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE (2004) UCSF chimera—a visualization system for exploratory research and analysis. J Comput Chem 25(13):1605–1612
Jmol: an open-source java viewer for chemical structure in 3D. http://www.jmol.org/
Humphrey W, Dalke A, Schulten K (1996) VMD: visual molecular dynamics. J Mol Graph 14(1):33–38
Rose AS, Hildebrand PW (2015) NGL viewer: a web application for molecular visualization. Nucleic Acids Res 43(Web Server issue):W576–W579. https://doi.org/10.1093/nar/gkv402
O’Donoghue SI, Goodsell DS, Frangakis AS, Jossinet F, Laskowski RA, Nilges M, Saibil HR, Schafferhans A, Wade RC, Westhof E (2010) Visualization of macromolecular structures. Nat Methods 7:S42–S55
Berntsson RP-A, Smits SH, Schmitt L, Slotboom D-J, Poolman B (2010) A structural classification of substrate-binding proteins. FEBS Lett 584(12):2606–2617
Radivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, Sokolov A, Graim K, Funk C, Verspoor K, Ben-Hur A (2013) A large-scale evaluation of computational protein function prediction. Nat Methods 10(3):221–227
Glaser F, Pupko T, Paz I, Bell RE, Bechor-Shental D, Martz E, Ben-Tal N (2003) ConSurf: identification of functional regions in proteins by surface-map** of phylogenetic information. Bioinformatics 19(1):163–164
Ashkenazy H, Abadi S, Martz E, Chay O, Mayrose I, Pupko T, Ben-Tal N (2016) ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res 44(W1):W344–W350
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504. https://doi.org/10.1101/gr.1239303
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Narunsky, A., Ben-Tal, N., Kolodny, R. (2019). Navigating Among Known Structures in Protein Space. In: Sikosek, T. (eds) Computational Methods in Protein Evolution. Methods in Molecular Biology, vol 1851. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8736-8_12
Download citation
DOI: https://doi.org/10.1007/978-1-4939-8736-8_12
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8735-1
Online ISBN: 978-1-4939-8736-8
eBook Packages: Springer Protocols