Log in

DeepBSRPred: deep learning-based binding site residue prediction for proteins

  • Original Article
  • Published:
Amino Acids Aims and scope Submit manuscript

Abstract

Motivation

Proteins–protein interactions (PPIs) are important to govern several cellular activities. Amino acid residues, which are located at the interface are known as the binding sites and the information about binding sites helps to understand the binding affinities and functions of protein–protein complexes.

Results

We have developed a deep neural network-based method, DeepBSRPred, for predicting the binding sites using protein sequence information and predicted structures from AlphaFold2. Specific sequence and structure-based features include position-specific scoring matrix (PSSM), solvent accessible surface area, conservation score and amino acid properties, and residue depth, respectively. Our method predicted the binding sites with an average F1 score of 0.73 in a dataset of 1236 proteins. Further, we compared the performance with other existing methods in the literature using four benchmark datasets and our method outperformed those methods.

Availability and implementation

The DeepBSRPred web server can be found at https://web.iitm.ac.in/bioinfo2/deepbsrpred/index.html, along with all datasets used in this study. The trained models, the DeepBSRPred standalone source code, and the feature computation pipeline are freely available at https://web.iitm.ac.in/bioinfo2/deepbsrpred/download.html.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

The data used in this work are available at https://web.iitm.ac.in/bioinfo2/deepbsrpred/download.html.

Abbreviations

An-Ab:

Antigen–antibody

EC:

Enzyme containing

GP:

G-protein containing

IN:

Inhibitor containing

RC:

Receptor containing

MS:

Miscellaneous

ASA:

Accessible surface area

AUROC:

Area under the receiver operating characteristic curve

AUPRC:

Area under precision-recall

PSSM:

Position-specific scoring matrix

F1:

F1-score

MCC:

Matthew’s correlation coefficient

Polar real:

ASA of Polar residues

BIOV880102:

Information value for accessibility (Biou et al. 1988)

NADH010102:

Hydropathy scale based on self-information values in the two-state model (Naderi-Manesh et al. 2001)

VALDAR:

Protein conservation metrics (Valdar and Thornton, 2001)

dASA:

Solvent accessible surface area for protein unfolding

PONJ960101:

Average volumes of residues (Pontius et al. 1996)

FASG760101:

Molecular weight (Fasman 1976)

GRAR740103:

Volume (Grantham 1974)

HB acceptor:

Hydrogen bond acceptor

ASAD:

Solvent accessible surface area for denatured protein (Gromiha et al. 1999)

ASAN:

Solvent accessible surface area for native protein (Gromiha et al. 1999)

TAYLOR_GAPS:

Conservation score (Taylor 1986)

PSSM sum:

Summation of PSSM values; Residue depth: Residue depth is computed using python

SMERFS:

Conservation from AAcon tool (Manning et al. 2008)

Contact count:

Number of contacts of the residue

References

  • Abadi M, Agarwal A et al. (2016) Tensorflow: Large-scale machine learning on heterogeneous distributed systems. ar**v preprint ar**v:1603.04467.

  • Agnieszka G, Peter V et al., (2018) AACon: A Fast Amino Acid Conservation Calculation Service. https://www.compbio.dundee.ac.uk/aacon/

  • Al-Rfou R, Alain G et al. (2016) Theano: a Python framework for fast computation of mathematical expressions. Comput Sci. abs/1605.02688

  • Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402. https://doi.org/10.1093/nar/25.17.3389

  • Amos-Binks A, Patulea C et al (2011) Binding site prediction for protein-protein interactions and novel motif discovery using re-occurring polypeptide sequences. BMC Bioinform 12:225

    Article  Google Scholar 

  • Asadabadi EB, Abdolmaleki P (2013) Predictions of protein-protein interfaces within membrane protein complexes. Avicenna J Med Biotechnol 5:148–157

    CAS  PubMed  PubMed Central  Google Scholar 

  • Asgari E, Mofrad MR (2015) Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS ONE 10:e0141287

    Article  PubMed  PubMed Central  Google Scholar 

  • Asgari E, McHardy, et al (2019) Probabilistic variable-length segmentation of protein sequences for discriminative motif discovery (DiMotif) and sequence embedding (ProtVecX). Sci Rep 9:3577

    Article  PubMed  PubMed Central  Google Scholar 

  • Biou V, Gibrat JF et al (1988) Secondary structure prediction: combination of three different methods. Protein Eng Des Sel 2(3):185–191

    Article  CAS  Google Scholar 

  • Branco P, Torgo L (2016) A survey of predictive modeling on imbalanced domains. ACM Computing Surveys (CSUR) 49(2):1–50

    Article  Google Scholar 

  • Cao B, Porollo A et al (2006) Enhanced recognition of protein transmembrane domains with prediction-based structural profiles. Bioinformatics 22:303–309

    Article  CAS  PubMed  Google Scholar 

  • Chakravarty S, Varadarajan R (1999) Residue depth: a novel parameter for the analysis of protein structure and stability. Structure 7:723–732

    Article  CAS  PubMed  Google Scholar 

  • Chen X, Jeong JC (2009) Sequence-based prediction of protein interaction sites with an integrative method. Bioinformatics 25:585–591

    Article  PubMed  Google Scholar 

  • Chen P, Li J (2010) Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information. BMC Bioinformatics 11:402

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Chollet F (2015) Keras: Deep learning library for theano and tensorflow. URL: https://keras.io/k, 7(8), T1.

  • Clark JJ, Orban ZJ et al (2020) Predicting binding sites from unbound versus bound protein structures. Sci Rep 10(1):15856

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Dhole K, Singh G et al (2014) Sequence-based prediction of protein-protein interaction sites with L1-logreg classifier. J Theor Biol 348:47–54

    Article  CAS  PubMed  Google Scholar 

  • Du X, Cheng J, and Song J (2009) Improved prediction of protein binding sites from sequences using genetic algorithm. Protein J 28(6):273–280. https://doi.org/10.1007/s10930-009-9192-1

  • Fasman GD (1976) Handbook of Biochemistry and Molecular Biology. Proteins. CRC Press, Cleveland

    Google Scholar 

  • Geng H, LuT, et al (2015) Prediction of protein-protein interaction sites based on naive Bayes classifier. Biochem Res Int 2015:1–7

    Article  CAS  Google Scholar 

  • Grantham R (1974) Amino acid difference formula to help explain protein evolution. Science 185(4154):862–864

    Article  CAS  PubMed  Google Scholar 

  • Gromiha MM, Oobatake M et al (1999) Important amino acid properties for enhanced thermostability from mesophilic to thermophilic proteins. Biophys Chem 82(1):51–67

    Article  CAS  PubMed  Google Scholar 

  • Gromiha MM, Yokota K et al (2009) Identification and analysis of binding site residues in protein-protein complexes. Int J Biol Biomed 3(9):415–420

    Google Scholar 

  • Gromiha MM, Saranya N et al (2011) Sequence and structural features of binding site residues in protein-protein complexes: comparison with protein-nucleic acid complexes. Proteome Science 9(Suppl 1):S13

    Article  PubMed  PubMed Central  Google Scholar 

  • Heinzinger M, Elnaggar A et al (2019) Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics 20:723

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Hubbard SJ, Thornton JM (1993) ‘NACCESS’, computer program. Department of Biochemistry and Molecular Biology, University College, London

    Google Scholar 

  • Hwang H, Petrey D et al (2016) A hybrid method for protein–protein interface prediction. Protein Sci 25:159–165

    Article  CAS  PubMed  Google Scholar 

  • Jia J, Liu Z et al (2016) iPPBS-Opt: a sequence-based ensemble classifier for identifying protein-protein binding sites by optimizing imbalanced training datasets. Molecules 21:95

    Article  PubMed  PubMed Central  Google Scholar 

  • Jones DT, Buchan DW et al (2012) PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28:184–190

    Article  CAS  PubMed  Google Scholar 

  • Jumper J, Evans R et al (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596(7873):583–589

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637

    Article  CAS  PubMed  Google Scholar 

  • Kawashima S, Pokarowski P et al (2008) AAindex: amino acid index database progress report. Nucleic Acids Res 36(Database issue):D202–D205

    CAS  PubMed  Google Scholar 

  • Konc J, Janezic D (2007) Protein-protein binding-sites prediction by protein surface structure conservation. J Chem Inf Model 47(3):940–944

    Article  CAS  PubMed  Google Scholar 

  • Laine E, Carbone A (2015) Local geometry and evolutionary conservation of protein surfaces reveal the multiple recognition patches in protein-protein interactions. PLoS Comput Biol 11:e1004580

    Article  PubMed  PubMed Central  Google Scholar 

  • Li Y, Golding GB et al (2021) DELPHI: accurate deep ensemble model for protein interaction sites prediction. Bioinformatics 37(7):896–904

    Article  CAS  PubMed  Google Scholar 

  • Liang S, Zhang J et al (2004) Prediction of the interaction site on the surface of an isolated protein structure by analysis of side chain energy scores. Proteins 57(3):548–557

    Article  CAS  PubMed  Google Scholar 

  • Lijnzaad P, Berendsen HJ, Argos P (1996) Hydrophobic patches on the surfaces of protein structures. Proteins 25(3):389–397

    Article  CAS  PubMed  Google Scholar 

  • Lise S, Archambeau C et al (2009) Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods. BMC Bioinform 10:365

    Article  Google Scholar 

  • Liu GH, Shen HB et al (2016) Prediction of protein–protein interaction sites with machine-learning-based data-cleaning and post-filtering procedures. J Membr Biol 249:141–153

    Article  CAS  PubMed  Google Scholar 

  • London N, Movshovitz-Attias D et al (2010) The structural basis of peptide-protein binding strategies. Structure 18:188–199

    Article  CAS  PubMed  Google Scholar 

  • Ma B, Elkayam T et al (2003) Protein-protein interactions: structurally conserved residues distinguish between binding sites and exposed protein surfaces. Proc Natl Acad Sci USA 100(10):5772–5777

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Maheshwari S, Brylinski M (2015) Prediction of protein–protein interaction sites from weakly homologous template structures using meta-threading and machine learning. J Mol Recognit 28:35–48

    Article  CAS  PubMed  Google Scholar 

  • Maheshwari S, Brylinski M (2016) Template-based identification of protein–protein interfaces using eFindSitePPI. Methods 93:64–71

    Article  CAS  PubMed  Google Scholar 

  • Manning JR, Jefferson ER et al (2008) The contrasting properties of conservation and correlated phylogeny in protein functional residue prediction. BMC Bioinform 9:51

    Article  Google Scholar 

  • McDonald IK, Thornton JM (1994) Satisfying hydrogen bonding potential in proteins. J Mol Biol 238(5):777–793

    Article  CAS  PubMed  Google Scholar 

  • Murakami Y, Mizuguchi K (2010) Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites. Bioinformatics 26:1841–1848

    Article  CAS  PubMed  Google Scholar 

  • Naderi-Manesh H, Sadeghi M et al (2001) Prediction of protein surface accessibility with information theory. Proteins 42(4):452–459

    Article  CAS  PubMed  Google Scholar 

  • Neuvirth H, Raz R et al (2004) ProMate: a structure-based prediction program to identify the location of protein-protein binding sites. J Mol Biol 338(1):181–199

    Article  CAS  PubMed  Google Scholar 

  • Ofran Y, Rost B (2007) ISIS: interaction sites identified from sequence. Bioinformatics 23:e13–e16

    Article  CAS  PubMed  Google Scholar 

  • Pedregosa F, Varoquaux G et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    Google Scholar 

  • Pontius J, Richelle J et al (1996) Deviations from standard atomic volumes as a quality measure for protein crystal structures. J Mol Biol 264(1):121–136

    Article  CAS  PubMed  Google Scholar 

  • Porollo A, Meller J (2007) Prediction-based fingerprints of protein-protein interactions. Proteins: structure. Function and Bioinformatics 66:630–645

    Article  CAS  Google Scholar 

  • Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3):e0118432

    Article  PubMed  PubMed Central  Google Scholar 

  • Singh G, Dhole K et al. (2014) SPRINGS: prediction of protein-protein interaction sites using artificial neural networks. Technical report. PeerJ PrePrints, PPR39858

  • Sun Y, Wong AKC, Kamel MS (2009) Classification of imbalanced data: A review. Int J Pattern Recognit Artif Intell 23(4):687–719. https://doi.org/10.1142/S0218001409007326

    Article  Google Scholar 

  • Taherzadeh G, Yang Y, Zhang T, Liew AW, Zhou Y (2016) Sequence-based prediction of protein-peptide binding sites using support vector machine. J Comput Chem 37(13):1223–1229. https://doi.org/10.1002/jcc.24314

  • Taylor WR (1986) The classification of amino acid conservation. J Theor Biol 119(2):205–218

    Article  CAS  PubMed  Google Scholar 

  • Thomas CN, Anja B et al (2018) IntPred: a structure-based predictor of protein–protein interaction sites. Bioinformatics 34:223–229

    Article  Google Scholar 

  • Valdar WS, Thornton JM (2001) Conservation helps to identify biologically relevant crystal contacts. J Mol Biol 313(2):399–416. https://doi.org/10.1006/jmbi.2001.5034

  • Valdar WS (2002) Scoring residue conservation. Proteins: Struct Funct Bioinform 48:227–241

    Article  CAS  Google Scholar 

  • Varadi M, Anyango S et al (2022) AlphaFold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res 50(D1):D439–D444

    Article  CAS  PubMed  Google Scholar 

  • Viloria SJ, Allega MF, Lambrughi M, Papaleo E (2017) An optimal distance cutoff for contact-based protein structure networks using side-chain centers of mass. Sci Rep 7:1–11

    Google Scholar 

  • Wang G, Dunbrack RL (2003) PISCES: a protein sequence culling server. Bioinformatics 19(12):1589–1591

    Article  CAS  PubMed  Google Scholar 

  • Wang X, Yu B (2019) Protein–protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique. Bioinformatics 35:2395–2402

    Article  CAS  PubMed  Google Scholar 

  • Wang DD, Wang R et al (2014) Fast prediction of protein–protein interaction sites based on extreme learning machines. Neurocomputing 128:258–266

    Article  Google Scholar 

  • Wei Z, Han K et al (2016) Protein–protein interaction sites prediction by ensembling SVM and sample-weighted random forests. Neurocomputing 193:201–212

    Article  Google Scholar 

  • Wei ZS, Yang JY, Shen HB, Yu DJ (2015) A cascade random forests algorithm for predicting protein-protein interaction sites. IEEE Trans Nanobiosci 14(7):746–760. https://doi.org/10.1109/TNB.2015.2475359

  • **e Z, Deng X et al (2020) Prediction of protein–protein interaction sites using convolutional neural network and improved data sets. Int J Mol Sci 21:467

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • **ngyu G, Zhenyu C et al (2016) Adaptive weighted imbalance learning with application to abnormal activity recognition. Neurocomputing 173:1927–1935

    Article  Google Scholar 

  • Xue LC, Dobbs D et al (2011) HomPPI: a class of sequence homology-based protein-protein interface prediction methods. BMC Bioinformatics 12:244

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zardecki C, Dutta S et al (2022) PDB-101: Educational resources supporting molecular explorations through biology and medicine. Protein Sci 31(1):129–140

    Article  CAS  PubMed  Google Scholar 

  • Zeng M, Zhang F et al (2019) Protein–protein interaction site prediction through combining local and global features with deep neural networks. Bioinformatics 36:1114–1120

    Article  Google Scholar 

  • Zhang J, Kurgan L (2019) Scriber: accurate and partner type-specific prediction of protein-binding residues from proteins sequences. Bioinformatics 35:i343–i353

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhang B, Li J et al (2019) Sequence-based prediction of protein-protein interaction sites by simplified long short-term memory network. Neurocomputing 357:86–100

    Article  Google Scholar 

Download references

Acknowledgements

We thank Indian Institute of Technology Madras and the High-Performance Computing Environment (HPCE) for computational facilities. The work is partially supported by the Department of Science and Technology, Government of India (No. DST/INT/SWD/P-05/2016).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: MMG; methodology: MMG, software/code: RN; investigation: RN, KY; discussion: RN, KY, MMG; writing original draft: RN; review & editing: MMG, KY; supervision: MMG. All authors read and approved the manuscript.

Corresponding author

Correspondence to M. Michael Gromiha.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare.

Additional information

Handling editor: F. Eisenhaber.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nikam, R., Yugandhar, K. & Gromiha, M.M. DeepBSRPred: deep learning-based binding site residue prediction for proteins. Amino Acids 55, 1305–1316 (2023). https://doi.org/10.1007/s00726-022-03228-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-022-03228-3

Keywords

Navigation