A Text Mining Protocol for Mining Biological Pathways and Regulatory Networks from Biomedical Literature

  • Protocol
  • First Online:
Biomedical Text Mining

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2496))

  • 795 Accesses

Abstract

A biological pathway or regulatory network is a collection of molecular regulators which can activate the changes in cellular processes leading to an assembly of new molecules by series of actions among the molecules. There are three important pathways in system biology studies namely signaling pathways, metabolic pathways, and genetic pathways (or) gene regulatory networks. Recently, biological pathway construction from scientific literature is given much attention as the scientific literature contains a rich set of linguistic features to extract biological associations between genes and proteins. These associations can be united to construct biological networks. Here, we present a brief overview about various biological pathways, biomedical text resources/corpora for network construction and state-of-the-art existing methods for network construction followed by our hybrid text mining protocol for extracting pathways and regulatory networks from biomedical literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
EUR 44.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 106.99
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 139.09
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
EUR 213.99
Price includes VAT (Germany)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Verfaillie A, Imrichová H, Van de Sande B, Standaert L, Christiaens V, Hulselmans G et al (2014) iRegulon: from a gene list to a gene regulatory network using large motif and track collections. PLoS Comput Biol 10(7):e1003731

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  2. Song YL, Chen SS (2009) Text mining biomedical literature for constructing gene regulatory networks. Interdiscip Sci 1(3):179–186

    Article  CAS  PubMed  Google Scholar 

  3. Alakwaa FM, Solouma NH, Kadah YM (2011) Construction of gene regulatory networks using biclustering and bayesian networks. Theor Biol Med Model 8(1):1–20

    Article  Google Scholar 

  4. Chen X, **e D, Zhao Q, You ZH (2019) MicroRNAs and complex diseases: from experimental results to computational models. Brief Bioinform 20(2):515–539

    Article  CAS  PubMed  Google Scholar 

  5. Oliveira AP, Patil KR, Nielsen J (2008) Architecture of transcriptional regulatory circuits is knitted over the topology of bio-molecular interaction networks. BMC Syst Biol 2(1):1–16

    Article  CAS  Google Scholar 

  6. Ananiadou S, Pyysalo S, Tsujii JI, Kell DB (2010) Event extraction for systems biology by text mining the literature. Trends Biotechnol 28(7):381–390

    Article  CAS  PubMed  Google Scholar 

  7. Krallinger M, Leitner F, Valencia A (2010) Analysis of biological processes and diseases using text mining approaches. Methods Mol Biol 593:341–382

    Article  CAS  PubMed  Google Scholar 

  8. Andronis C, Sharma A, Virvilis V, Deftereos S, Persidis A (2011) Literature mining, ontologies and information visualization for drug repurposing. Brief Bioinform 12(4):357–368

    Article  CAS  PubMed  Google Scholar 

  9. Li C, Liakata M, Rebholz-Schuhmann D (2014) Biological network extraction from scientific literature: state of the art and challenges. Brief Bioinform 15(5):856–877

    Article  PubMed  Google Scholar 

  10. Bodenreider O (2008) Biomedical ontologies in action: role in knowledge management, data integration and decision support. Yearb Med Inform:67–79

    Google Scholar 

  11. Friedman C, Rindflesch TC, Corn M (2013) Natural language processing: state of the art and prospects for significant progress, a workshop sponsored by the National Library of medicine. J Biomed Inform 46(5):765–773

    Article  PubMed  Google Scholar 

  12. Nair A, Chauhan P, Saha B, Kubatzky KF (2019) Conceptual evolution of cell signaling. Int J Mol Sci 20(13):3292

    Article  CAS  PubMed Central  Google Scholar 

  13. Buschiazzo A, Trajtenberg F (2019) Two-component sensing and regulation: how do histidine kinases talk with response regulators at the molecular level? Annu Rev Microbiol 73:507–528

    Article  CAS  PubMed  Google Scholar 

  14. Caspi R, Dreher K, Karp PD (2013) The challenge of constructing, classifying, and representing metabolic pathways. FEMS Microbiol Lett 345(2):85–93

    Article  CAS  PubMed  Google Scholar 

  15. Cakmak, A. (2009). Mining metabolic networks and biomedical literature (doctoral dissertation, Case Western Reserve University)

    Google Scholar 

  16. Binkhonain M, Zhao L (2019) A review of machine learning algorithms for identification and classification of non-functional requirements. Expert Syst. Appl.: X 1:100001

    Google Scholar 

  17. Abdulkadhar S, Murugesan G, Natarajan J (2020) Classifying protein-protein interaction articles from biomedical literature using many relevant features and context-free grammar. J King Saud Univ-Comput Inf Sci 32(5):553–560

    Google Scholar 

  18. Hirschman L, Yeh A, Blaschke C, Valencia A (2005) Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics 6(Suppl 1):S1

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  19. Kim JD, Ohta T, Pyysalo S, Kano Y, Tsujii JI (2009) Overview of BioNLP’09 shared task on event extraction. In: Proceedings of the BioNLP 2009 workshop companion volume for shared task, pp 1–9

    Google Scholar 

  20. Kim JD, Ohta T, Tateisi Y, Tsujii JI (2003) GENIA corpus—a semantically annotated corpus for bio-textmining. Bioinformatics 19(suppl_1):i180–i182

    Article  PubMed  Google Scholar 

  21. Thompson P, Nawaz R, McNaught J, Ananiadou S (2011) Enriching a biomedical event corpus with meta-knowledge annotation. BMC Bioinformatics 12(1):1–18

    Article  Google Scholar 

  22. Pyysalo S, Ginter F, Heimonen J, Björne J, Boberg J, Järvinen J, Salakoski T (2007) BioInfer: a corpus for information extraction in the biomedical domain. BMC Bioinformatics 8(1):1–24

    Article  CAS  Google Scholar 

  23. Kongburan W, Padungweang P, Krathu W, Chan JH (2019) Enhancing metabolic event extraction performance with multitask learning concept. J Biomed Inform 93:103156

    Article  PubMed  Google Scholar 

  24. Phongwattana T, Chan JH (2019) Development of biomedical corpus enlargement platform using BERT for bio-entity recognition. In: International Conference on Neural Information Processing. Springer, Cham, pp 454–463

    Chapter  Google Scholar 

  25. Nédellec C, Bossy R, Kim JD, Kim JJ, Ohta T, Pyysalo S, Zweigenbaum P (2013) Overview of BioNLP shared task 2013. In: Proceedings of the BioNLP shared task 2013 workshop, pp 1–7

    Google Scholar 

  26. Bossy R, Golik W, Ratkovic Z, Valsamou D, Bessieres P, Nédellec C (2015) Overview of the gene regulation network and the bacteria biotope tasks in BioNLP'13 shared task. BMC Bioinformatics 16(10):1–16

    Article  Google Scholar 

  27. Tang YT, Li SJ, Kao HY, Tsai SJ, Wang HC (2011) Using unsupervised patterns to extract gene regulation relationships for network construction. PLoS One 6(5):e19633

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Al-Aamri A, Taha K, Al-Hammadi Y, Maalouf M, Homouz D (2017) Constructing genetic networks using biomedical literature and rare event classification. Sci Rep 7(1):1–12

    Article  CAS  Google Scholar 

  29. Szostak J, Ansari S, Madan S, Fluck J, Talikka M, Iskandar A, De Leon H, Hofmann-Apitius M, Peitsch MC, Hoeng J (2015) Construction of biological networks from unstructured information based on a semi-automated curation workflow. Database 2015:bav057

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Jurca G, Addam O, Aksac A, Gao S, Özyer T, Demetrick D, Alhajj R (2016) Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends. BMC Res Notes 9(1):1–35

    Article  CAS  Google Scholar 

  31. Guthke R, Gerber S, Conrad T, Vlaic S, Durmuş S, Çakır T, Sevilgen FE, Shelest E, Linde J (2016) Data-based reconstruction of gene regulatory networks of fungal pathogens. Front Microbiol 7:570

    Article  PubMed  PubMed Central  Google Scholar 

  32. Soliman M, Nasraoui O, Cooper NG (2016) Building a glaucoma interaction network using a text mining approach. BioData Mining 9(1):1–25

    Article  CAS  Google Scholar 

  33. Žitnik S, Žitnik M, Zupan B, Bajec M (2015) Sieve-based relation extraction of gene regulatory networks from biological literature. BMC Bioinformatics 16(16):1–16

    Article  Google Scholar 

  34. Tangirala K, Caragea D (2013) Extraction of gene regulatory networks from biological literature. In: 2013 IEEE 3rd International Conference on Computational Advances in Bio and medical Sciences (ICCABS). IEEE, London, pp 1–6

    Google Scholar 

  35. Gaizauskas R, Humphreys K, Demetriou G (2000) Information extraction from biological science journal articles: enzyme interactions and protein structures. In: Proceedings of the workshop chemical data analysis in the large: the challenge of the automation age. Logos Verlag Berlin, Bozen, pp 7–17

    Google Scholar 

  36. Zhang L, Berleant D, Ding J, Cao T, Wurtele ES (2009) PathBinder–text empirics and automatic extraction of biomolecular interactions. BMC bioinformatics 10(11):1–13

    Article  CAS  Google Scholar 

  37. Patumcharoenpol P, Doungpan N, Meechai A, Shen B, Chan JH, Vongsangnak W (2016) An integrated text mining framework for metabolic interaction network reconstruction. PeerJ 4:e1811

    Article  PubMed  PubMed Central  Google Scholar 

  38. Czarnecki J, Nobeli I, Smith AM, Shepherd AJ (2012) A text-mining system for extracting metabolic reactions from full-text articles. BMC Bioinformatics 13(1):1–14

    Article  Google Scholar 

  39. Holtzapple E, Telmer CA, Miskov-Zivanov N (2020) FLUTE: Fast and reliable knowledge retrieval from biomedical literature. Database 2020:baaa056

    Article  PubMed  PubMed Central  Google Scholar 

  40. Nam JH, Couch D, da Silveira WA, Yu Z, Chung D (2020) PALMER: improving pathway annotation based on the biomedical literature mining with a constrained latent block model. BMC Bioinformatics 21(1):1–20

    Article  Google Scholar 

  41. Saberian N, Shafi A, Peyvandipour A, Draghici S (2020) MAGPEL: an autoMated pipeline for inferring vAriant-driven gene PanEls from the full-length biomedical literature. Sci Rep 10(1):1–11

    Article  CAS  Google Scholar 

  42. Murugesan G, Abdulkadhar S, Bhasuran B, Natarajan J (2017) BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition. EURASIP J Bioinforma Syst Biol 2017(1):1–8

    Google Scholar 

  43. Abdulkadhar S, Bhasuran B, Natarajan J (2021) Multiscale Laplacian graph kernel combined with lexico-syntactic patterns for biomedical event extraction from literature. Knowl Inf Syst 63(1):143–173

    Article  Google Scholar 

  44. Jia Y, Huan J (2010) Constructing non-stationary dynamic Bayesian networks with a flexible lag choosing mechanism. BMC Bioinformatics 11(6):1–13

    Article  Google Scholar 

  45. Wang X, Ji Q (2012) Learning dynamic Bayesian network discriminatively for human activity recognition. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). IEEE, London, pp 3553–3556

    Google Scholar 

  46. Zhang X, Zhao XM, He K, Lu L, Cao Y, Liu J, Hao J-K, Liu Z-P, Chen L (2012) Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information. Bioinformatics 28(1):98–104

    Article  CAS  PubMed  Google Scholar 

  47. Liang KC, Wang X (2008) Gene regulatory network reconstruction using conditional mutual information. EURASIP J Bioinforma Syst Biol 2008:1–14

    Article  Google Scholar 

  48. Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. In: Proceedings of the International AAAI Conference on Web and Social Media (Vol. 3, No. 1)

    Google Scholar 

  49. Palmer DD (2000) Tokenisation and sentence segmentation. In: Dale R, Moisl H, Somers H (eds) Handbook of natural language processing. CRC Press, Boca Raton, Florida

    Google Scholar 

  50. Vijayarani S, Ilamathi MJ, Nithya M (2015) Preprocessing techniques for text mining-an overview. Int J Comput Sci Commun Network 5(1):7–16

    Google Scholar 

  51. Kannan S, Gurusamy V, Vijayarani S, Ilamathi J, Nithya M (2014) Preprocessing techniques for text mining. Int J Comput Sci Commun Network 5(1):7–16

    Google Scholar 

  52. Gross M (1998) Lemmatization of compound tenses in English. Lingvisticae Investigationes 22(1–2):71–122

    Google Scholar 

  53. Liu H, Christiansen T, Baumgartner WA, Verspoor K (2012) BioLemmatizer: a lemmatization tool for morphological processing of biomedical text. J Biomed Semantics 3(1):1–29

    Article  Google Scholar 

  54. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537

    Google Scholar 

  55. Ferilli S, Esposito F, Grieco D (2014) Automatic learning of linguistic resources for stopword removal and stemming from text. Procedia Comput. Sci 38:116–123

    Article  Google Scholar 

  56. Kübler S, McDonald R, Nivre J (2009) Dependency parsing. Synth Lect Hum Lang Technol 1(1):1–127

    Article  Google Scholar 

  57. De Marneffe MC, Manning CD (2008) The Stanford typed dependencies representation. In: Coling 2008: proceedings of the workshop on cross-framework and cross-domain parser evaluation, pp 1–8

    Google Scholar 

  58. Sagae K, Tsujii JI (2010) Dependency parsing and domain adaptation with data-driven LR models and parser ensembles. Trends Pars Technol:57–68

    Google Scholar 

  59. Baldridge J (2014). The opennlp project. 2005. http://opennlpapacheorg/indexhtml. Accessed 2 February 2012

    Google Scholar 

  60. Perera N, Dehmer M, Emmert-Streib F (2020) Named entity recognition and relation detection for biomedical information extraction. Front Cell Dev Biol 8:673

    Article  PubMed  PubMed Central  Google Scholar 

  61. Leaman R, Gonzalez G (2008) BANNER: an executable survey of advances in biomedical named entity recognition. Pac Symp Biocomput:652–663

    Google Scholar 

  62. Raja K, Subramani S, Natarajan J (2014) A hybrid named entity tagger for tagging human proteins/genes. Int J Data Min Bioinform 10(3):315–328

    Article  PubMed  Google Scholar 

  63. Bhasuran B, Murugesan G, Abdulkadhar S, Natarajan J (2016) Stacked ensemble combined with fuzzy matching for biomedical named entity recognition of diseases. J Biomed Inform 64:1–9

    Article  PubMed  Google Scholar 

  64. Vanegas JA, Matos S, González F, Oliveira JL (2015) An overview of biomolecular event extraction from scientific documents. Comput Math Methods Med 2015:571381

    Article  PubMed  PubMed Central  Google Scholar 

  65. Nawaz R, Thompson P, Ananiadou S (2012) Identification of manner in bio-events. Lrec:3505–3510

    Google Scholar 

  66. Ananiadou S, Thompson P, Nawaz R, McNaught J, Kell DB (2015) Event-based text mining for biology and functional genomics. Brief Funct Genomics 14(3):213–230

    Article  CAS  PubMed  Google Scholar 

  67. Hu Z, Mellor J, Wu J, Yamada T, Holloway D, DeLisi C (2005) VisANT: data-integrating visual framework for biological networks and modules. Nucleic Acids Res 33(suppl_2):W352–W357

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Nikitin A, Egorov S, Daraselia N, Mazo I (2003) Pathway studio—the analysis and navigation of molecular networks. Bioinformatics 19(16):2155–2157

    Article  CAS  PubMed  Google Scholar 

  69. Demir E, Babur O, Dogrusoz U, Gursoy A, Nisanci G, Cetin-Atalay R, Ozturk M (2002) PATIKA: an integrated visual environment for collaborative construction and analysis of cellular pathways. Bioinformatics 18(7):996–1003

    Article  CAS  PubMed  Google Scholar 

  70. Makhoul J, Kubala F, Schwartz R, Weischedel R (1999) Performance measures for information extraction. In: Proceedings of DARPA broadcast news workshop, pp 249–252

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Abdulkadhar, S., Natarajan, J. (2022). A Text Mining Protocol for Mining Biological Pathways and Regulatory Networks from Biomedical Literature. In: Raja, K. (eds) Biomedical Text Mining. Methods in Molecular Biology, vol 2496. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2305-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2305-3_8

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2304-6

  • Online ISBN: 978-1-0716-2305-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Navigation