Prediction of Protein Phosphorylation Sites by Integrating Secondary Structure Information and Other One-Dimensional Structural Properties

  • Protocol
  • First Online:
Prediction of Protein Secondary Structure

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1484))

Abstract

Studies on phosphorylation are important but challenging for both wet-bench experiments and computational studies, and accurate non-kinase-specific prediction tools are highly desirable for whole-genome annotation in a wide variety of species. Here, we describe a phosphorylation site prediction webserver, PhosphoSVM, that employs Support Vector Machine to combine protein secondary structure information and seven other one-dimensional structural properties, including Shannon entropy, relative entropy, predicted protein disorder information, predicted solvent accessible area, amino acid overlap** properties, averaged cumulative hydrophobicity, and subsequence k-nearest neighbor profiles. This method achieved AUC values of 0.8405/0.8183/0.7383 for serine (S), threonine (T), and tyrosine (Y) phosphorylation sites, respectively, in animals with a tenfold cross-validation. The model trained by the animal phosphorylation sites was also applied to a plant phosphorylation site dataset as an independent test. The AUC values for the independent test data set were 0.7761/0.6652/0.5958 for S/T/Y phosphorylation sites, respectively. This algorithm with the optimally trained model was implemented as a webserver. The webserver, trained model, and all datasets used in the current study are available at http://sysbio.unl.edu/PhosphoSVM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Caenepeel S, Charydczak G, Sudarsanam S, Hunter T, Manning G (2004) The mouse kinome: discovery and comparative genomics of all mouse protein kinases. Proc Natl Acad Sci U S A 101(32):11707–11712. doi:10.1073/pnas.0306880101

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S (2002) The protein kinase complement of the human genome. Science 298(5600):1912–1934. doi:10.1126/science.1075762

    Article  CAS  PubMed  Google Scholar 

  3. Vlad F, Turk BE, Peynot P, Leung J, Merlot S (2008) A versatile strategy to define the phosphorylation preferences of plant protein kinases and screen for putative substrates. Plant J 55(1):104–117. doi:10.1111/j.1365-313X.2008.03488.x

    Article  CAS  PubMed  Google Scholar 

  4. Trost B, Kusalik A (2011) Computational prediction of eukaryotic phosphorylation sites. Bioinformatics 27(21):2927–2935. doi:10.1093/bioinformatics/btr525

    Article  CAS  PubMed  Google Scholar 

  5. Dou Y, Yao B, Zhang C (2014) PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine. Amino Acids 46(6):1459–1469. doi:10.1007/s00726-014-1711-5

    Article  CAS  PubMed  Google Scholar 

  6. Diella F, Gould CM, Chica C, Via A, Gibson TJ (2008) Phospho.ELM, a database of phosphorylation sites—update. Nucleic Acids Res 36(Database issue):D240–D244. doi:10.1093/nar/gkm772

    CAS  PubMed  Google Scholar 

  7. Heazlewood JL, Durek P, Hummel J, Selbig J, Weckwerth W, Walther D, Schulze WX (2008) PhosPhAt: a database of phosphorylation sites in Arabidopsis thaliana and a plant-specific phosphorylation site predictor. Nucleic Acids Res 36(Database issue):D1015–D1021. doi:10.1093/nar/gkm812

    CAS  PubMed  Google Scholar 

  8. Durek P, Schmidt R, Heazlewood JL, Jones A, MacLean D, Nagel A, Kersten B, Schulze WX (2010) PhosPhAt: the Arabidopsis thaliana phosphorylation site database. An update. Nucleic Acids Res 38(Database issue):D828–D834. doi:10.1093/nar/gkp810

    Article  CAS  PubMed  Google Scholar 

  9. Zulawski M, Braginets R, Schulze WX (2013) PhosPhAt goes kinases—searchable protein kinase target information in the plant phosphorylation site database PhosPhAt. Nucleic Acids Res 41(Database issue):D1176–D1184. doi:10.1093/nar/gks1081

    Article  CAS  PubMed  Google Scholar 

  10. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Fan RE, Chen PH, Lin CJ (2005) Working set selection using second order information for training support vector machines. J Mach Learn Res 6:1889–1918

    Google Scholar 

  12. Iakoucheva LM, Radivojac P, Brown CJ, O'Connor TR, Sikes JG, Obradovic Z, Dunker AK (2004) The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res 32(3):1037–1049. doi:10.1093/nar/gkh253

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. McGuffin LJ, Bryson K, Jones DT (2000) The PSIPRED protein structure prediction server. Bioinformatics 16(4):404–405

    Article  CAS  PubMed  Google Scholar 

  14. Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT (2004) Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol 337(3):635–645. doi:10.1016/j.jmb.2004.02.002

    Article  CAS  PubMed  Google Scholar 

  15. Ahmad S, Gromiha MM, Sarai A (2003) RVP-net: online prediction of real valued accessible surface area of proteins from single sequences. Bioinformatics 19(14):1849–1851

    Article  CAS  PubMed  Google Scholar 

  16. Taylor WR (1986) The classification of amino acid conservation. J Theor Biol 119(2):205–218

    Article  CAS  PubMed  Google Scholar 

  17. Sweet RM, Eisenberg D (1983) Correlation of sequence hydrophobicities measures similarity in three-dimensional protein structure. J Mol Biol 171(4):479–488

    Article  CAS  PubMed  Google Scholar 

  18. Biswas AK, Noman N, Sikder AR (2010) Machine learning approach to predict protein phosphorylation sites by incorporating evolutionary information. BMC Bioinformatics 11:273. doi:10.1186/1471-2105-11-273

    Article  PubMed  PubMed Central  Google Scholar 

  19. Capra JA, Singh M (2007) Predicting functionally important residues from sequence conservation. Bioinformatics 23(15):1875–1882. doi:10.1093/bioinformatics/btm270

    Article  CAS  PubMed  Google Scholar 

  20. Mihalek I, Res I, Lichtarge O (2004) A family of evolution-entropy hybrid methods for ranking protein residues by importance. J Mol Biol 336(5):1265–1282. doi:10.1016/j.jmb.2003.12.078

    Article  CAS  PubMed  Google Scholar 

  21. Johansson F, Toh H (2010) A comparative study of conservation and variation scores. BMC Bioinformatics 11:388. doi:10.1186/1471-2105-11-388

    Article  PubMed  PubMed Central  Google Scholar 

  22. Wu TD, Brutlag DL (1995) Identification of protein motifs using conserved amino acid properties and partitioning techniques. Proc Int Conf Intell Syst Mol Biol 3:402–410

    CAS  PubMed  Google Scholar 

  23. Gok M, Ozcerit AT (2012) Prediction of MHC class I binding peptides with a new feature encoding technique. Cell Immunol 275(1–2):1–4. doi:10.1016/j.cellimm.2012.04.005

    Article  CAS  PubMed  Google Scholar 

  24. Wu CY, Hwa YH, Chen YC, Lim C (2012) Hidden relationship between conserved residues and locally conserved phosphate-binding structures in NAD(P)-binding proteins. J Phys Chem B. doi:10.1021/jp3014332

    Google Scholar 

  25. Dou Y, Zheng X, Yang J, Wang J (2010) Prediction of catalytic residues based on an overlap** amino acid classification. Amino Acids 39(5):1353–1361. doi:10.1007/s00726-010-0587-2

    Article  CAS  PubMed  Google Scholar 

  26. Dou Y, Wang J, Yang J, Zhang C (2012) L1pred: a sequence-based prediction tool for catalytic residues in enzymes with the L1-logreg classifier. PLoS One 7(4):e35666. doi:10.1371/journal.pone.0035666

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Zhang T, Zhang H, Chen K, Shen S, Ruan J, Kurgan L (2008) Accurate sequence-based prediction of catalytic residues. Bioinformatics 24(20):2329–2338. doi:10.1093/bioinformatics/btn433

    Article  CAS  PubMed  Google Scholar 

  28. Wang L, Brown SJ (2006) BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res 34(Web Server issue):W243–W248. doi:10.1093/nar/gkl298

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Gao J, Thelen JJ, Dunker AK, Xu D (2010) Musite, a tool for global prediction of general and kinase-specific phosphorylation sites. Mol Cell Proteomics 9(12):2586–2600. doi:10.1074/mcp.M110.001388

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgement

This project was supported by funding under CZ’s startup funds from University of Nebraska, Lincoln, NE. This work was completed utilizing the Holland Computing Center of the University of Nebraska.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chi Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media New York

About this protocol

Cite this protocol

Dou, Y., Yao, B., Zhang, C. (2017). Prediction of Protein Phosphorylation Sites by Integrating Secondary Structure Information and Other One-Dimensional Structural Properties. In: Zhou, Y., Kloczkowski, A., Faraggi, E., Yang, Y. (eds) Prediction of Protein Secondary Structure. Methods in Molecular Biology, vol 1484. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6406-2_18

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6406-2_18

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-6404-8

  • Online ISBN: 978-1-4939-6406-2

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Navigation