MetaLLM: Residue-Wise Metal Ion Prediction Using Deep Transformer Model

  • Conference paper
  • First Online:
Bioinformatics and Biomedical Engineering (IWBBIO 2023)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 13920))

Abstract

Proteins bind to metals such as copper, zinc, magnesium, etc., serving various purposes such as importing, exporting, or transporting metal in other parts of the cell as ligands and maintaining stable protein structure to function properly. A metal binding site indicates the single amino acid position where a protein binds a metal ion. Manually identifying metal binding sites is expensive, laborious, and time-consuming. A tiny fraction of the millions of proteins in UniProtKB – the most comprehensive protein database – are annotated with metal binding sites, leaving many millions of proteins waiting for metal binding site annotation. Develo** a computational pipeline is thus essential to keep pace with the growing number of proteins. A significant shortcoming of the existing computational methods is the consideration of the long-term dependency of the residues. Other weaknesses include low accuracy, absence of positional information, hand-engineered features, and a pre-determined set of residues and metal ions. In this paper, we propose MetaLLM, a metal binding site prediction technique, by leveraging the recent progress in self-supervised attention-based (e.g. Transformer) large language models (LLMs) and a considerable amount of protein sequences publicly available. LLMs are capable of modelling long residual dependency in a sequence. The proposed MetaLLM uses a transformer pre-trained on an extensive database of protein sequences and later fine-tuned on metal-binding proteins for multi-label metal ions prediction. A stratified 10-fold cross-validation shows more than 90% precision for the most prevalent metal ions. Moreover, the comparative performance analysis confirms the superiority of the proposed MetaLLM over classical machine-learning techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/DeepChainBio/bio-transformers.

  2. 2.

    https://github.com/DeepChainBio/bio-transformers.

References

  1. Uniprot: the universal protein knowledgebase in 2023. Nucleic Acids Res. 51(D1), D523–D531 (2023)

    Google Scholar 

  2. Andreini, C., Bertini, I., Rosato, A.: A hint to search for metalloproteins in gene banks. Bioinformatics 20(9), 1373–1380 (2004)

    Article  CAS  PubMed  Google Scholar 

  3. Aptekmann, A.A., Buongiorno, J., Giovannelli, D., Glamoclija, M., Ferreiro, D.U., Bromberg, Y.: mebipred: identifying metal-binding potential in protein sequence. Bioinformatics 38(14), 3532–3540 (2022). https://doi.org/10.1093/bioinformatics/btac358

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Babor, M., Gerzon, S., Raveh, B., Sobolev, V., Edelman, M.: Prediction of transition metal-binding sites from apo protein structures. Proteins: Struct. Funct. Bioinf. 70(1), 208–217 (2008)

    Google Scholar 

  5. Bromberg, Y., et al.: Quantifying structural relationships of metal-binding sites suggests origins of biological electron transfer. Sci. Adv. 8(2), eabj3984 (2022)

    Google Scholar 

  6. Cheng, Y., et al.: Co-evolution-based prediction of metal-binding sites in proteomes by machine learning. Nat. Chem. Biol., 1–8 (2023)

    Google Scholar 

  7. Dreiseitl, S., Ohno-Machado, L.: Logistic regression and artificial neural network classification models: a methodology review. J. Biomed. Inf. 35(5–6), 352–359 (2002)

    Article  Google Scholar 

  8. Elnaggar, A., et al.: Prottrans: towards cracking the language of life’s code through self-supervised deep learning and high performance computing. ar**v preprint ar**v:2007.06225 (2020)

  9. Gucwa, M., et al.: CMM-An enhanced platform for interactive validation of metal binding sites. Protein Sci. 32, e4525 (2022)

    Google Scholar 

  10. Guerois, R., Serrano, L.: The sh3-fold family: experimental evidence and prediction of variations in the folding pathways. J. Mol. Biol. 304(5), 967–982 (2000)

    Article  CAS  PubMed  Google Scholar 

  11. Haberal, İ., Oğul, H.: DeepMBS: prediction of protein metal binding-site using deep learning networks. In: 2017 Fourth International Conference on Mathematics and Computers in Sciences and in Industry (MCSI), pp. 21–25. IEEE (2017)

    Google Scholar 

  12. Haberal, İ, Oğul, H.: Prediction of protein metal binding sites using deep neural networks. Mol. Inf. 38(7), 1800169 (2019)

    Article  Google Scholar 

  13. He, W., Liang, Z., Teng, M., Niu, L.: mFASD: a structure-based algorithm for discriminating different types of metal-binding sites. Bioinformatics 31(12), 1938–1944 (2015)

    Article  CAS  PubMed  Google Scholar 

  14. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  CAS  PubMed  Google Scholar 

  15. Jahromi, A.H., Taheri, M.: A non-parametric mixture of gaussian naive bayes classifiers based on local independent features. In: 2017 Artificial Intelligence and Signal Processing Conference (AISP). pp. 209–212 (2017). https://doi.org/10.1109/AISP.2017.8324083

  16. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  CAS  PubMed  Google Scholar 

  17. Lin, C.T., Lin, K.L., Yang, C.H., Chung, I.F., Huang, C.D., Yang, Y.S.: Protein metal binding residue prediction based on neural networks. Int. J. Neural Syst. 15(01n02), 71–84 (2005)

    Google Scholar 

  18. Lin, H., et al.: Prediction of the functional class of metal-binding proteins from sequence derived physicochemical properties by support vector machine approach. In: BMC Bioinformatics, vol. 7, pp. 1–10. BioMed Central (2006)

    Google Scholar 

  19. Lin, Y.F., Cheng, C.W., Shih, C.S., Hwang, J.K., Yu, C.S., Lu, C.H.: MIB: metal ion-binding site prediction and docking server. J. Chem. Inf. Model. 56(12), 2287–2291 (2016)

    Article  CAS  PubMed  Google Scholar 

  20. Lippi, M., Passerini, A., Punta, M., Rost, B., Frasconi, P.: Metaldetector: a web server for predicting metal-binding sites and disulfide bridges in proteins from sequence. Bioinformatics 24(18), 2094–2095 (2008)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Lu, C.H., Lin, Y.F., Lin, J.J., Yu, C.S.: Prediction of metal ion-binding sites in proteins using the fragment transformation method. PLoS ONE 7(6), e39252 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Mendes, J., Guerois, R., Serrano, L.: Energy estimation in protein design. Curr. Opin. Struct. Biol. 12(4), 441–446 (2002)

    Article  CAS  PubMed  Google Scholar 

  23. Mohamadi, A., Cheng, T., **, L., Wang, J., Sun, H., Koohi-Moghadam, M.: An ensemble 3d deep-learning model to predict protein metal-binding site. Cell Rep. Phys. Sci. 3(9), 101046 (2022)

    Article  CAS  Google Scholar 

  24. Passerini, A., Punta, M., Ceroni, A., Rost, B., Frasconi, P.: Identifying cysteines and histidines in transition-metal-binding sites using support vector machines and neural networks. Proteins: Struct. Funct. Bioinf. 65(2), 305–316 (2006)

    Google Scholar 

  25. Rao, R., et al.: MSA transformer. bioRxiv (2021). https://doi.org/10.1101/2021.02.12.430858, https://www.biorxiv.org/content/10.1101/2021.02.12.430858v1

  26. Rao, R.M., Meier, J., Sercu, T., Ovchinnikov, S., Rives, A.: Transformer protein language models are unsupervised structure learners. bioRxiv (2020). https://doi.org/10.1101/2020.12.15.422761, https://www.biorxiv.org/content/10.1101/2020.12.15.422761v1

  27. Rives, A., et al.: Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. bioRxiv (2019). https://doi.org/10.1101/622803, https://www.biorxiv.org/content/10.1101/622803v4

  28. Rossi, F., Villa, N.: Support vector machine for functional data classification. Neurocomputing 69(7–9), 730–742 (2006)

    Article  Google Scholar 

  29. Schymkowitz, J.W., Rousseau, F., Martins, I.C., Ferkinghoff-Borg, J., Stricher, F., Serrano, L.: Prediction of water and metal binding sites and their affinities by using the fold-x force field. Proc. Natl. Acad. Sci. 102(29), 10147–10152 (2005)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Shu, N., Zhou, T., Hovmöller, S.: Prediction of zinc-binding sites in proteins from sequence. Bioinformatics 24(6), 775–782 (2008)

    Article  CAS  PubMed  Google Scholar 

  31. Shwartz-Ziv, R., Armon, A.: Tabular data: deep learning is not all you need. Inf. Fusion 81, 84–90 (2022)

    Article  Google Scholar 

  32. Sodhi, J.S., Bryson, K., McGuffin, L.J., Ward, J.J., Wernisch, L., Jones, D.T.: Predicting metal-binding site residues in low-resolution structural models. J. Mol. Biol. 342(1), 307–320 (2004)

    Article  CAS  PubMed  Google Scholar 

  33. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  34. Ye, N., et al.: A comprehensive review of computation-based metal-binding prediction approaches at the residue level. BioMed Res. Int. 2022 (2022)

    Google Scholar 

  35. Yuan, Q., Chen, S., Wang, Y., Zhao, H., Yang, Y.: Alignment-free metal ion-binding site prediction from protein sequence through pretrained language model and multi-task learning. bioRxiv (2022)

    Google Scholar 

  36. Zhang, S., Li, X., Zong, M., Zhu, X., Wang, R.: Efficient kNN classification with different numbers of nearest neighbors. IEEE Trans. Neural Netw. Learn. Syst. 29(5), 1774–1785 (2018). https://doi.org/10.1109/TNNLS.2017.2673241

    Article  PubMed  Google Scholar 

  37. Zhao, J., Cao, Y., Zhang, L.: Exploring the computational methods for protein-ligand binding site prediction. Comput. Struct. Biotechnol. J. 18, 417–426 (2020)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Zhao, W., et al.: Structure-based de novo prediction of zinc-binding sites in proteins of unknown function. Bioinformatics 27(9), 1262–1268 (2011)

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fairuz Shadmani Shishir .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shishir, F.S., Sarker, B., Rahman, F., Shomaji, S. (2023). MetaLLM: Residue-Wise Metal Ion Prediction Using Deep Transformer Model. In: Rojas, I., Valenzuela, O., Rojas Ruiz, F., Herrera, L.J., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2023. Lecture Notes in Computer Science(), vol 13920. Springer, Cham. https://doi.org/10.1007/978-3-031-34960-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-34960-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-34959-1

  • Online ISBN: 978-3-031-34960-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation