Hidden Markov Models for Protein Domain Homology Identification and Analysis

Jablonowski, Karl

doi:10.1007/978-1-4939-6762-9_3

Karl Jablonowski^4,5

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1555))

1638 Accesses
3 Citations
1 Altmetric

Abstract

Protein domain identification and analysis are cornerstones of modern proteomics. The tools available to protein domain researchers avail a variety of approaches to understanding large protein domain families. Hidden Markov Models (HMM) form the basis for identifying and categorizing evolutionarily linked protein domains. Here I describe the use of HMM models for predicting and identifying Src Homology 2 (SH2) domains within the proteome.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Roadmap to Domain Based Proteomics

Predicting Protein Function Using Homology-Based Methods

Predicting Sub-cellular Location of Proteins Based on Hierarchical Clustering and Hidden Markov Models

References

Liu BA, Shah E, Jablonowski K, Stergachis A, Engelmann B, Nash PD (2011) The SH2 domain-containing proteins in 21 species establish the provenance and scope of phosphotyrosine signaling in eukaryotes. Sci Signal 4(202):ra83. doi:10.1126/scisignal.2002105
Article PubMed PubMed Central Google Scholar
Liu BA, Nash PD (2012) Evolution of SH2 domains and phosphotyrosine signalling networks. Philos Trans R Soc Lond B Biol Sci 367(1602):2556–2573. doi:10.1098/rstb.2012.0107
Article CAS PubMed PubMed Central Google Scholar
Finn RD, Coggill P, Eberhardt RY et al (2016) The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res 44(Database issue):D279–D285. doi:10.1093/nar/gkv1344
Article PubMed Google Scholar
Sigrist CJA, de Castro E, Cerutti L, Cuche BA, Hulo N, Bridge A, Bougueleret L, Xenarios I (2012) New and continuing developments at PROSITE. Nucleic Acids Res. doi:10.1093/nar/gks1067
Google Scholar
Gough J, Karplus K, Hughey R, Chothia C (2001) Assignment of homology to genome sequences using a Library of Hidden Markov Models that represent all proteins of known structure. J Mol Biol 313(4):903–919
Article CAS PubMed Google Scholar
Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, Bernard T, Binns D, Bork P, Burge S, de Castro E, Coggill P, Corbett M, Das U, Daugherty L, Duquenne L, Finn RD, Fraser M, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, McMenamin C, Mi H, Mutowo-Muellenet P, Mulder N, Natale D, Orengo C, Pesseat S, Punta M, Quinn AF, Rivoire C, Sangrador-Vegas A, Selengut JD, Sigrist CJ, Scheremetjew M, Tate J, Thimmajanarthanan M, Thomas PD, Wu CH, Yeats C, Yong SY (2012) InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 40(Database issue):D306–D312. doi:10.1093/nar/gkr948
Article CAS PubMed Google Scholar
Triplet T, Shortridge M, Griep M, Stark J, Powers R, Revesz P (2010) PROFESS: a PROtein Function, Evolution, Structure and Sequence database. Database (Oxford) 2010:baq011
Article Google Scholar
Whelan S, de Bakker PIW, Quevillon E, Rodriguez N, Goldman N (2006) PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees. Nucleic Acids Res 34(Database issue):D327–D331. doi:10.1093/nar/gkj087
Article CAS PubMed Google Scholar
Liu BA, Engelmann BW, Jablonowski K, Higginbotham K, Stergachis AB, Nash PD (2012) SRC Homology 2 Domain Binding Sites in Insulin, IGF-1 and FGF receptor mediated signaling networks reveal an extensive potential interactome. Cell Commun Signal 10(1):27. doi:10.1186/1478-811X-10-27
Article PubMed PubMed Central Google Scholar
Liu BA, Jablonowski K, Raina M, Arce M, Pawson T, Nash PD (2006) The human and mouse complement of SH2 domain proteins-establishing the boundaries of phosphotyrosine signaling. Mol Cell 22(6):851–868. doi:10.1016/j.molcel.2006.06.001
Article PubMed Google Scholar
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197
Article CAS PubMed Google Scholar
Dijkstra EW (1959) A note on two problems in connexion with graphs. Numer Math 1:260–271
Article Google Scholar
Dayhoff MO, Schwartz RM, Orcutt BC (1978) A model of evolutionary change in proteins. In: Dayhoff MO (ed) Atlas of protein sequece and structure, supplement 3. National Biomedical Research Foundation, Washington, DC, pp 345–352
Google Scholar
Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89(22):10915–10919
Article CAS PubMed PubMed Central Google Scholar
Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS (2005) The Universal Protein Resource (UniProt). Nucleic Acids Res 33(Database issue):D154–D159. doi:10.1093/nar/gki070
Article CAS PubMed Google Scholar
Hubbard T, Andrews D, Caccamo M, Cameron G, Chen Y, Clamp M, Clarke L, Coates G, Cox T, Cunningham F, Curwen V, Cutts T, Down T, Durbin R, Fernandez-Suarez XM, Gilbert J, Hammond M, Herrero J, Hotz H, Howe K, Iyer V, Jekosch K, Kahari A, Kasprzyk A, Keefe D, Keenan S, Kokocinsci F, London D, Longden I, McVicker G, Melsopp C, Meidl P, Potter S, Proctor G, Rae M, Rios D, Schuster M, Searle S, Severin J, Slater G, Smedley D, Smith J, Spooner W, Stabenau A, Stalker J, Storey R, Trevanion S, Ureta-Vidal A, Vogel J, White S, Woodwark C, Birney E (2005) Ensembl 2005. Nucleic Acids Res 33(Database issue):D447–D453. doi:10.1093/nar/gki138
Article CAS PubMed Google Scholar
Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, DiCuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM (2014) RefSeq: an update on mammalian reference sequences. Nucleic Acids Res 42(Database issue):D756–D763. doi:10.1093/nar/gkt1114
Article CAS PubMed Google Scholar
Letunic I, Doerks T, Bork P (2009) SMART 6: recent updates and new developments. Nucleic Acids Res 37(Database issue):D229–D232. doi:10.1093/nar/gkn808
Article CAS PubMed Google Scholar
Sigrist CJ, Cerutti L, de Castro E, Langendijk-Genevaux PS, Bulliard V, Bairoch A, Hulo N (2010) PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res 38(Database issue):D161–D166. doi:10.1093/nar/gkp885
Article CAS PubMed Google Scholar
Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39(Web Server issue):W29–W37. doi:10.1093/nar/gkr367
Article CAS PubMed PubMed Central Google Scholar
Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23(21):2947–2948. doi:10.1093/bioinformatics/btm404
Article CAS PubMed Google Scholar
Maglott D, Ostell J, Pruitt KD, Tatusova T (2005) Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 33(Database Issue):D54–D58. doi:10.1093/nar/gki031
Article CAS PubMed Google Scholar
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Article CAS PubMed Google Scholar
Sievers F, Wilm A, Dineen DG, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson JD, Higgins D (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7:539. doi:10.1038/msb.2011.75
Article PubMed PubMed Central Google Scholar
Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30(14):3059–3066
Article CAS PubMed PubMed Central Google Scholar
Loytynoja A, Goldman N (2005) An algorithm for progressive multiple alignment of sequences with insertions. Proc Natl Acad Sci U S A 102:10557–10562
Article PubMed PubMed Central Google Scholar
Felsenstein J (1989) PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5:164–166
Google Scholar
Page RD (2002) Visualizing phylogenetic trees using TreeView. Curr Protoc Bioinformatics 00:6.2:6.2.1–6.2.15
Google Scholar
Perrière G, Gouy M (1996) WWW-Query: an on-line retrieval system for biological sequence banks. Biochimie 78:364–369
Article PubMed Google Scholar

Download references

Acknowledgments

The knowledge amassed to write this chapter was based on work supported by the University of Chicago Cancer Research Foundation Women’s Board and Piers Nash’s laboratory at the University of Chicago Ben May Department for Cancer Research.

Author information

Authors and Affiliations

Division of Emergency Medicine, Department of Medicine, University of Washington, 325 9th Ave., Seattle, WA, USA
Karl Jablonowski
Division of Biomedical and Health Informatics, Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
Karl Jablonowski

Authors

Karl Jablonowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karl Jablonowski .

Editor information

Editors and Affiliations

Department of Genetics and Genome Sciences, University of Connecticut School of Medicine, Farmington, Connecticut, USA
Kazuya Machida
Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
Bernard A. Liu

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Jablonowski, K. (2017). Hidden Markov Models for Protein Domain Homology Identification and Analysis. In: Machida, K., Liu, B. (eds) SH2 Domains. Methods in Molecular Biology, vol 1555. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6762-9_3

Download citation

DOI: https://doi.org/10.1007/978-1-4939-6762-9_3
Published: 15 January 2017
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-6760-5
Online ISBN: 978-1-4939-6762-9
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Hidden Markov Models for Protein Domain Homology Identification and Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Roadmap to Domain Based Proteomics

Predicting Protein Function Using Homology-Based Methods

Predicting Sub-cellular Location of Proteins Based on Hierarchical Clustering and Hidden Markov Models

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Hidden Markov Models for Protein Domain Homology Identification and Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Roadmap to Domain Based Proteomics

Predicting Protein Function Using Homology-Based Methods

Predicting Sub-cellular Location of Proteins Based on Hierarchical Clustering and Hidden Markov Models

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation