Abstract
The biomedical literature contains many protein-protein interactions (PPIs) affected by genetic mutations. Automatic extraction of PPIs affected by gene mutations described in biomedical literature can help evaluate the clinical significance of gene variations, which plays a crucial role in the realization of precision medicine. This paper proposes a novel Gaussian-enhanced representation model (GRM) to extract PPI, which uses Gaussian probability distribution to generate target entity representation based on BioBERT pre-trained model. The proposed GRM enhanced the weight of target protein entities and their adjacents, solved the problem of long input text and scattered distribution of target entities in the PPI task, and introduced a supervised contrast learning method to improve the effectiveness and robustness of the model. Experiment results on the BioCreative VI data set show that our proposed model GRM leads to a new state-of-the-art performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Rezarta, I.D., et al.: Overview of the BioCreative VI precision medicine track: mining protein interactions and mutations for precision medicine. Database J. Biol. Databases Curation (2019)
Zhou, H., Zhuang, N.S., Lang, C., Du, L.: Knowledge-aware attention network for protein-protein interaction extraction (2020)
Wang, Y., Zhang, S., Zhang, Y., Wang, J., Lin, H.: Extracting protein-protein interactions affected by mutations via auxiliary task and domain pre-trained model. In: BIBM 2020 (2020)
Guo, M., Zhang, Y., Liu, T.: Gaussian transformer: a lightweight approach for natural language inference. In: EAAI 2019 (2019)
Sun, K., et al.: BioCreative V BioC track overview: collaborative biocurator assistant task for BioGRID. Database (2016)
Krallinger, M., Vazquez, M., Leitner, F., Salgado, D., Valencia, A.: The protein-protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text. BMC Bioinform. 12(S8), S3 (2011)
Huang, M., Zhu, X., Payan, D.G., Qu, K., Ming, L.: Discovering patterns to extract protein-protein interactions from full biomedical texts, pp. 3604–3612 (2004)
Chen, Q., Chandrasekarasastry, N.P., Elangovan, A., Davis, M., Verspoor, K.M.: Document triage and relation extraction for protein-protein interactions affected by mutations, pp. 103–106 (2017)
Chowdhury, M.F.M., Lavelli, A.: Combining tree structures, flat features and patterns for biomedical relation extraction. In: Conference of the European Chapter of the Association for Computational Linguistics: 2012, pp. 420–429 (2012)
Phan, T.T.T., Ohkawa, T.: Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features. BMC Bioinform. (2016)
Qian, L.H., Zhou, G.D.: Tree kernel-based protein-protein interaction extraction from biomedical literature. J Biomed. Inform. (2012)
Peng, Y., Lu, Z.: Deep learning for extracting protein-protein interactions from biomedical literature (2017)
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant map**. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), pp. 1735–1742. IEEE (2006)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML, pp. 1597–1607 (2020)
He, K., Fan, H., Wu, Y., **e, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR, pp. 9726–9735 (2020)
Khosla, P., et al.: Supervised contrastive learning. In: NeurIPS, pp. 18661–18673 (2020)
Wang, Y., et al.: ClusterSCL: cluster-aware supervised contrastive learning on graphs, pp. 1611–1621. ACM (2022)
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2018)
Wei, C.H., Kao, H.Y., Lu, Z.: GNormPlus: an integrative approach for tagging genes, gene families, and protein domains. Biomed. Res. Int. 2015, 918710 (2015)
Tran, T., Kavuluru, R.: Exploring a deep learning pipeline for the BioCreative VI precision medicine task. In: BioCreative VI Workshop: 2017, pp. 107–110 (2017)
Tung, T., Ramakanth, K.: An end-to-end deep learning architecture for extracting protein-protein interactions affected by genetic mutations. Database J. Biol. Databases Curation (2018)
Acknowledgment
This work is supported by grant from the Natural Science Foundation of China (No. 62072070).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, D., Zhang, Y., Yang, M., Chen, F., Lu, M. (2022). Gaussian-Enhanced Representation Model for Extracting Protein-Protein Interactions Affected by Mutations. In: Bansal, M.S., Cai, Z., Mangul, S. (eds) Bioinformatics Research and Applications. ISBRA 2022. Lecture Notes in Computer Science(), vol 13760. Springer, Cham. https://doi.org/10.1007/978-3-031-23198-8_28
Download citation
DOI: https://doi.org/10.1007/978-3-031-23198-8_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23197-1
Online ISBN: 978-3-031-23198-8
eBook Packages: Computer ScienceComputer Science (R0)