A Hybrid Protocol for Finding Novel Gene Targets for Various Diseases Using Microarray Expression Data Analysis and Text Mining

  • Protocol
  • First Online:
Biomedical Text Mining

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2496))

Abstract

The advancement in technology for various scientific experiments and the amount of raw data produced from that is enormous, thus giving rise to various subsets of biologists working with genome, proteome, transcriptome, expression, pathway, and so on. This has led to exponential growth in scientific literature which is becoming beyond the means of manual curation and annotation for extracting information of importance. Microarray data are expression data, analysis of which results in a set of up/downregulated lists of genes that are functionally annotated to ascertain the biological meaning of genes. These genes are represented as vocabularies and/or Gene Ontology terms when associated with pathway enrichment analysis need relational and conceptual understanding to a disease. The chapter deals with a hybrid approach we designed for identifying novel drug–disease targets. Microarray data for muscular dystrophy is explored here as an example and text mining approaches are utilized with an aim to identify promisingly novel drug targets. Our main objective is to give a basic overview from a biologist’s perspective for whom text mining approaches of data mining and information retrieval is fairly a new concept. The chapter aims to bridge the gap between biologist and computational text miners and bring about unison for a more informative research in a fast and time efficient manner.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
EUR 44.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 106.99
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 139.09
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
EUR 213.99
Price includes VAT (Germany)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Piro RM, Di Cunto F (2012) Computational approaches to disease-gene prediction: rationale, classification and successes. FEBS J 279(5):678–696. https://doi.org/10.1111/j.1742-4658.2012.08471.x

    Article  CAS  PubMed  Google Scholar 

  2. Krallinger M, Leitner F, Vazquez M, Salgado D, Marcelle C, Tyers M et al (2012) How to link ontologies and protein–protein interactions to literature: text-mining approaches and the BioCreative experience. Database 2012:bas017. https://doi.org/10.1093/database/bas017

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Arrowsmith J (2011) Trial watch phase II failures: 2008–2010. Nat Rev Drug Discov 10(5):328–329. https://doi.org/10.1038/nrd3439

    Article  CAS  PubMed  Google Scholar 

  4. Dai Y-F, Zhao X-M (2015) A survey on the computational approaches to identify drug targets in the postgenomic era. Biomed Res Int 2015:239654. https://doi.org/10.1155/2015/239654

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Ma C-C, Wang Z-L, Xu T, He Z-Y, Wei Y-Q (2020) The approved gene therapy drugs worldwide: from 1998 to 2019. Biotechnol Adv 40:107502. https://doi.org/10.1016/j.biotechadv.2019.107502

    Article  CAS  PubMed  Google Scholar 

  6. Himič V, Davies KE (2021) Evaluating the potential of novel genetic approaches for the treatment of Duchenne muscular dystrophy. Eur J Hum Genet 29(9):1369–1376. https://doi.org/10.1038/s41431-021-00811-2

    Article  PubMed  PubMed Central  Google Scholar 

  7. Kupatt C, Windisch A, Moretti A, Wolf E, Wurst W, Walter MC (2021) Genome editing for Duchenne muscular dystrophy: a glimpse of the future? Gene Ther 28(9):542–548. https://doi.org/10.1038/s41434-021-00222-4

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Sun C, Shen L, Zhang Z, **e X (2020) Therapeutic strategies for Duchenne muscular dystrophy: an update. Genes (Basel) 11(8):837. https://doi.org/10.3390/genes11080837

    Article  CAS  Google Scholar 

  9. Ferrero E, Dunham I, Sanseau P (2017) In silico prediction of novel therapeutic targets using gene–disease association data. J Transl Med 15(1):182. https://doi.org/10.1186/s12967-017-1285-6

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Lin Y, Mehta S, Küçük-McGinty H, Turner JP, Vidovic D, Forlin M et al (2017) Drug target ontology to classify and integrate drug discovery data. J Biomed Semantics 8(1):50. https://doi.org/10.1186/s13326-017-0161-x

    Article  PubMed  PubMed Central  Google Scholar 

  11. Santos R, Ursu O, Gaulton A, Bento AP, Donadi RS, Bologa CG et al (2017) A comprehensive map of molecular drug targets. Nat Rev Drug Discov 16(1):19–34. https://doi.org/10.1038/nrd.2016.230

    Article  CAS  PubMed  Google Scholar 

  12. Overington JP, Al-Lazikani B, Hopkins AL (2006) How many drug targets are there? Nat Rev Drug Discov 5(12):993–996. https://doi.org/10.1038/nrd2199

    Article  CAS  PubMed  Google Scholar 

  13. Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G et al (2019) Applications of machine learning in drug discovery and development. Nat Rev Drug Discov 18(6):463–477. https://doi.org/10.1038/s41573-019-0024-5

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Patel L, Shukla T, Huang X, Ussery DW, Wang S (2020) Machine learning methods in drug discovery. Molecules 25(22):5277. https://doi.org/10.3390/molecules25225277

    Article  CAS  PubMed Central  Google Scholar 

  15. Zheng S, Dharssi S, Wu M, Li J, Lu Z (2019) Text mining for drug discovery. Methods Mol Biol 1939:231–252. https://doi.org/10.1007/978-1-4939-9089-4_13

    Article  CAS  PubMed  Google Scholar 

  16. Cheng T, Hao M, Takeda T, Bryant SH, Wang Y (2017) Large-scale prediction of drug-target interaction: a data-centric review. AAPS J 19(5):1264–1275. https://doi.org/10.1208/s12248-017-0092-6

    Article  CAS  PubMed  Google Scholar 

  17. Opap K, Mulder N (2017) Recent advances in predicting gene-disease associations. F1000Res 6:578. https://doi.org/10.12688/f1000research.10788.1

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Papanikolaou N, Pavlopoulos GA, Theodosiou T, Vizirianakis IS, Iliopoulos I (2016) DrugQuest - a text mining workflow for drug association discovery. BMC Bioinformatics 17(Suppl 5):182. https://doi.org/10.1186/s12859-016-1041-6

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Rodriguez-Esteban R, Bundschus M (2016) Text mining patents for biomedical knowledge. Drug Discov Today 21(6):997–1002. https://doi.org/10.1016/j.drudis.2016.05.002

    Article  CAS  PubMed  Google Scholar 

  20. Kafkas Ş, Dunham I, McEntyre J (2017) Literature evidence in open targets – a target validation platform. bioRxiv. 124719. https://doi.org/10.1101/124719

  21. Schriml LM, Mitraka E (2015) The disease ontology: fostering interoperability between biological and clinical human disease-related data. Mamm Genome 26(9):584–589. https://doi.org/10.1007/s00335-015-9576-9

    Article  PubMed  PubMed Central  Google Scholar 

  22. Gremse M, Chang A, Schomburg I, Grote A, Scheer M, Ebeling C et al (2011) The BRENDA tissue ontology (BTO): the first all-integrating ontology of all organisms for enzyme sources. Nucleic Acids Res 39(Database issue):D507–DD13. https://doi.org/10.1093/nar/gkq968

    Article  CAS  PubMed  Google Scholar 

  23. Natale DA, Arighi CN, Blake JA, Bona J, Chen C, Chen S-C et al (2016) Protein ontology (PRO): enhancing and scaling up the representation of protein entities. Nucleic Acids Res 45(D1):D339–DD46. https://doi.org/10.1093/nar/gkw1075

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Yang Y, Adelstein SJ, Kassis AI (2009) Target discovery from data mining approaches. Drug Discov Today 14(3):147–154. https://doi.org/10.1016/j.drudis.2008.12.005

    Article  PubMed  Google Scholar 

  25. Rodriguez-Esteban R, Jiang X (2017) Differential gene expression in disease: a comparison between high-throughput studies and the literature. BMC Med Genet 10(1):59. https://doi.org/10.1186/s12920-017-0293-y

    Article  CAS  Google Scholar 

  26. Wang T, Li B, Nelson CE, Nabavi S (2019) Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data. BMC Bioinformatics 20(1):40. https://doi.org/10.1186/s12859-019-2599-6

    Article  PubMed  PubMed Central  Google Scholar 

  27. Marco-Puche G, Lois S, BenĂ­tez J, Trivino JC (2019) RNA-Seq perspectives to improve clinical diagnosis. Front Genet 10:1152. https://doi.org/10.3389/fgene.2019.01152

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Gambardella G, di Bernardo D (2019) A tool for visualization and analysis of single-cell RNA-Seq data based on text mining. Front Genet 10:734. https://doi.org/10.3389/fgene.2019.00734

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Chiesa M, Colombo GI, Piacentini L (2017) DaMiRseq—an R/Bioconductor package for data mining of RNA-Seq data: normalization, feature selection and classification. Bioinformatics 34(8):1416–1418. https://doi.org/10.1093/bioinformatics/btx795

    Article  CAS  Google Scholar 

  30. Gonorazky H, Liang M, Cummings B, Lek M, Micallef J, Hawkins C et al (2015) RNAseq analysis for the diagnosis of muscular dystrophy. Ann Clin Transl Neurol 3(1):55–60. https://doi.org/10.1002/acn3.267

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Jiang Z, Shi Y, Tan G, Wang Z (2021) Computational screening of potential glioma-related genes and drugs based on analysis of GEO dataset and text mining. PLoS One 16(2):e0247612. https://doi.org/10.1371/journal.pone.0247612

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Bian Y, Yang L, Zhao M, Li Z, Xu Y, Zhou G et al (2019) Identification of key genes and pathways in post-traumatic stress disorder using microarray analysis. Front Psychol 10:302. https://doi.org/10.3389/fpsyg.2019.00302

    Article  PubMed  PubMed Central  Google Scholar 

  33. Mi H, Muruganujan A, Casagrande JT, Thomas PD (2013) Large-scale gene function analysis with the PANTHER classification system. Nat Protoc 8(8):1551–1566. https://doi.org/10.1038/nprot.2013.092

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Baran J, Gerner M, Haeussler M, Nenadic G, Bergman CM (2011) pubmed2ensembl: a resource for mining the biological literature on genes. PLoS One 6(9):e24716-e. https://doi.org/10.1371/journal.pone.0024716

    Article  CAS  Google Scholar 

  35. Maglott D, Ostell J, Pruitt KD, Tatusova T (2005) Entrez gene: gene-centered information at NCBI. Nucleic Acids Res 33(Database issue):D54–DD8. https://doi.org/10.1093/nar/gki031

    Article  CAS  PubMed  Google Scholar 

  36. Davis AP, Grondin CJ, Johnson RJ, Sciaky D, Wiegers J, Wiegers TC et al (2020) Comparative Toxicogenomics database (CTD): update 2021. Nucleic Acids Res 49(D1):D1138–D1D43. https://doi.org/10.1093/nar/gkaa891

    Article  CAS  PubMed Central  Google Scholar 

  37. Wiegers TC, Davis AP, Cohen KB, Hirschman L, Mattingly CJ (2009) Text mining and manual curation of chemical-gene-disease networks for the comparative toxicogenomics database (CTD). BMC Bioinformatics 10:326. https://doi.org/10.1186/1471-2105-10-326

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S et al (2021) The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res 49(D1):D605–DD12. https://doi.org/10.1093/nar/gkaa1074

    Article  CAS  PubMed  Google Scholar 

  39. McCray AT, Burgun A, Bodenreider O (2001) Aggregating UMLS semantic types for reducing conceptual complexity. Stud Health Technol Inform 84(Pt 1):216–220

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Bodenreider O, McCray AT (2003) Exploring semantic groups through visual approaches. J Biomed Inform 36(6):414–432. https://doi.org/10.1016/j.jbi.2003.11.002

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Manoharan, S., Iyyappan, O.R. (2022). A Hybrid Protocol for Finding Novel Gene Targets for Various Diseases Using Microarray Expression Data Analysis and Text Mining. In: Raja, K. (eds) Biomedical Text Mining. Methods in Molecular Biology, vol 2496. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2305-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2305-3_3

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2304-6

  • Online ISBN: 978-1-0716-2305-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Navigation