Abstract
Proteins bind to metals such as copper, zinc, magnesium, etc., serving various purposes such as importing, exporting, or transporting metal in other parts of the cell as ligands and maintaining stable protein structure to function properly. A metal binding site indicates the single amino acid position where a protein binds a metal ion. Manually identifying metal binding sites is expensive, laborious, and time-consuming. A tiny fraction of the millions of proteins in UniProtKB – the most comprehensive protein database – are annotated with metal binding sites, leaving many millions of proteins waiting for metal binding site annotation. Develo** a computational pipeline is thus essential to keep pace with the growing number of proteins. A significant shortcoming of the existing computational methods is the consideration of the long-term dependency of the residues. Other weaknesses include low accuracy, absence of positional information, hand-engineered features, and a pre-determined set of residues and metal ions. In this paper, we propose MetaLLM, a metal binding site prediction technique, by leveraging the recent progress in self-supervised attention-based (e.g. Transformer) large language models (LLMs) and a considerable amount of protein sequences publicly available. LLMs are capable of modelling long residual dependency in a sequence. The proposed MetaLLM uses a transformer pre-trained on an extensive database of protein sequences and later fine-tuned on metal-binding proteins for multi-label metal ions prediction. A stratified 10-fold cross-validation shows more than 90% precision for the most prevalent metal ions. Moreover, the comparative performance analysis confirms the superiority of the proposed MetaLLM over classical machine-learning techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Uniprot: the universal protein knowledgebase in 2023. Nucleic Acids Res. 51(D1), D523–D531 (2023)
Andreini, C., Bertini, I., Rosato, A.: A hint to search for metalloproteins in gene banks. Bioinformatics 20(9), 1373–1380 (2004)
Aptekmann, A.A., Buongiorno, J., Giovannelli, D., Glamoclija, M., Ferreiro, D.U., Bromberg, Y.: mebipred: identifying metal-binding potential in protein sequence. Bioinformatics 38(14), 3532–3540 (2022). https://doi.org/10.1093/bioinformatics/btac358
Babor, M., Gerzon, S., Raveh, B., Sobolev, V., Edelman, M.: Prediction of transition metal-binding sites from apo protein structures. Proteins: Struct. Funct. Bioinf. 70(1), 208–217 (2008)
Bromberg, Y., et al.: Quantifying structural relationships of metal-binding sites suggests origins of biological electron transfer. Sci. Adv. 8(2), eabj3984 (2022)
Cheng, Y., et al.: Co-evolution-based prediction of metal-binding sites in proteomes by machine learning. Nat. Chem. Biol., 1–8 (2023)
Dreiseitl, S., Ohno-Machado, L.: Logistic regression and artificial neural network classification models: a methodology review. J. Biomed. Inf. 35(5–6), 352–359 (2002)
Elnaggar, A., et al.: Prottrans: towards cracking the language of life’s code through self-supervised deep learning and high performance computing. ar**v preprint ar**v:2007.06225 (2020)
Gucwa, M., et al.: CMM-An enhanced platform for interactive validation of metal binding sites. Protein Sci. 32, e4525 (2022)
Guerois, R., Serrano, L.: The sh3-fold family: experimental evidence and prediction of variations in the folding pathways. J. Mol. Biol. 304(5), 967–982 (2000)
Haberal, İ., Oğul, H.: DeepMBS: prediction of protein metal binding-site using deep learning networks. In: 2017 Fourth International Conference on Mathematics and Computers in Sciences and in Industry (MCSI), pp. 21–25. IEEE (2017)
Haberal, İ, Oğul, H.: Prediction of protein metal binding sites using deep neural networks. Mol. Inf. 38(7), 1800169 (2019)
He, W., Liang, Z., Teng, M., Niu, L.: mFASD: a structure-based algorithm for discriminating different types of metal-binding sites. Bioinformatics 31(12), 1938–1944 (2015)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Jahromi, A.H., Taheri, M.: A non-parametric mixture of gaussian naive bayes classifiers based on local independent features. In: 2017 Artificial Intelligence and Signal Processing Conference (AISP). pp. 209–212 (2017). https://doi.org/10.1109/AISP.2017.8324083
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Lin, C.T., Lin, K.L., Yang, C.H., Chung, I.F., Huang, C.D., Yang, Y.S.: Protein metal binding residue prediction based on neural networks. Int. J. Neural Syst. 15(01n02), 71–84 (2005)
Lin, H., et al.: Prediction of the functional class of metal-binding proteins from sequence derived physicochemical properties by support vector machine approach. In: BMC Bioinformatics, vol. 7, pp. 1–10. BioMed Central (2006)
Lin, Y.F., Cheng, C.W., Shih, C.S., Hwang, J.K., Yu, C.S., Lu, C.H.: MIB: metal ion-binding site prediction and docking server. J. Chem. Inf. Model. 56(12), 2287–2291 (2016)
Lippi, M., Passerini, A., Punta, M., Rost, B., Frasconi, P.: Metaldetector: a web server for predicting metal-binding sites and disulfide bridges in proteins from sequence. Bioinformatics 24(18), 2094–2095 (2008)
Lu, C.H., Lin, Y.F., Lin, J.J., Yu, C.S.: Prediction of metal ion-binding sites in proteins using the fragment transformation method. PLoS ONE 7(6), e39252 (2012)
Mendes, J., Guerois, R., Serrano, L.: Energy estimation in protein design. Curr. Opin. Struct. Biol. 12(4), 441–446 (2002)
Mohamadi, A., Cheng, T., **, L., Wang, J., Sun, H., Koohi-Moghadam, M.: An ensemble 3d deep-learning model to predict protein metal-binding site. Cell Rep. Phys. Sci. 3(9), 101046 (2022)
Passerini, A., Punta, M., Ceroni, A., Rost, B., Frasconi, P.: Identifying cysteines and histidines in transition-metal-binding sites using support vector machines and neural networks. Proteins: Struct. Funct. Bioinf. 65(2), 305–316 (2006)
Rao, R., et al.: MSA transformer. bioRxiv (2021). https://doi.org/10.1101/2021.02.12.430858, https://www.biorxiv.org/content/10.1101/2021.02.12.430858v1
Rao, R.M., Meier, J., Sercu, T., Ovchinnikov, S., Rives, A.: Transformer protein language models are unsupervised structure learners. bioRxiv (2020). https://doi.org/10.1101/2020.12.15.422761, https://www.biorxiv.org/content/10.1101/2020.12.15.422761v1
Rives, A., et al.: Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. bioRxiv (2019). https://doi.org/10.1101/622803, https://www.biorxiv.org/content/10.1101/622803v4
Rossi, F., Villa, N.: Support vector machine for functional data classification. Neurocomputing 69(7–9), 730–742 (2006)
Schymkowitz, J.W., Rousseau, F., Martins, I.C., Ferkinghoff-Borg, J., Stricher, F., Serrano, L.: Prediction of water and metal binding sites and their affinities by using the fold-x force field. Proc. Natl. Acad. Sci. 102(29), 10147–10152 (2005)
Shu, N., Zhou, T., Hovmöller, S.: Prediction of zinc-binding sites in proteins from sequence. Bioinformatics 24(6), 775–782 (2008)
Shwartz-Ziv, R., Armon, A.: Tabular data: deep learning is not all you need. Inf. Fusion 81, 84–90 (2022)
Sodhi, J.S., Bryson, K., McGuffin, L.J., Ward, J.J., Wernisch, L., Jones, D.T.: Predicting metal-binding site residues in low-resolution structural models. J. Mol. Biol. 342(1), 307–320 (2004)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Ye, N., et al.: A comprehensive review of computation-based metal-binding prediction approaches at the residue level. BioMed Res. Int. 2022 (2022)
Yuan, Q., Chen, S., Wang, Y., Zhao, H., Yang, Y.: Alignment-free metal ion-binding site prediction from protein sequence through pretrained language model and multi-task learning. bioRxiv (2022)
Zhang, S., Li, X., Zong, M., Zhu, X., Wang, R.: Efficient kNN classification with different numbers of nearest neighbors. IEEE Trans. Neural Netw. Learn. Syst. 29(5), 1774–1785 (2018). https://doi.org/10.1109/TNNLS.2017.2673241
Zhao, J., Cao, Y., Zhang, L.: Exploring the computational methods for protein-ligand binding site prediction. Comput. Struct. Biotechnol. J. 18, 417–426 (2020)
Zhao, W., et al.: Structure-based de novo prediction of zinc-binding sites in proteins of unknown function. Bioinformatics 27(9), 1262–1268 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Shishir, F.S., Sarker, B., Rahman, F., Shomaji, S. (2023). MetaLLM: Residue-Wise Metal Ion Prediction Using Deep Transformer Model. In: Rojas, I., Valenzuela, O., Rojas Ruiz, F., Herrera, L.J., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2023. Lecture Notes in Computer Science(), vol 13920. Springer, Cham. https://doi.org/10.1007/978-3-031-34960-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-34960-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34959-1
Online ISBN: 978-3-031-34960-7
eBook Packages: Computer ScienceComputer Science (R0)