Abstract
The advancement in technology for various scientific experiments and the amount of raw data produced from that is enormous, thus giving rise to various subsets of biologists working with genome, proteome, transcriptome, expression, pathway, and so on. This has led to exponential growth in scientific literature which is becoming beyond the means of manual curation and annotation for extracting information of importance. Microarray data are expression data, analysis of which results in a set of up/downregulated lists of genes that are functionally annotated to ascertain the biological meaning of genes. These genes are represented as vocabularies and/or Gene Ontology terms when associated with pathway enrichment analysis need relational and conceptual understanding to a disease. The chapter deals with a hybrid approach we designed for identifying novel drug–disease targets. Microarray data for muscular dystrophy is explored here as an example and text mining approaches are utilized with an aim to identify promisingly novel drug targets. Our main objective is to give a basic overview from a biologist’s perspective for whom text mining approaches of data mining and information retrieval is fairly a new concept. The chapter aims to bridge the gap between biologist and computational text miners and bring about unison for a more informative research in a fast and time efficient manner.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Piro RM, Di Cunto F (2012) Computational approaches to disease-gene prediction: rationale, classification and successes. FEBS J 279(5):678–696. https://doi.org/10.1111/j.1742-4658.2012.08471.x
Krallinger M, Leitner F, Vazquez M, Salgado D, Marcelle C, Tyers M et al (2012) How to link ontologies and protein–protein interactions to literature: text-mining approaches and the BioCreative experience. Database 2012:bas017. https://doi.org/10.1093/database/bas017
Arrowsmith J (2011) Trial watch phase II failures: 2008–2010. Nat Rev Drug Discov 10(5):328–329. https://doi.org/10.1038/nrd3439
Dai Y-F, Zhao X-M (2015) A survey on the computational approaches to identify drug targets in the postgenomic era. Biomed Res Int 2015:239654. https://doi.org/10.1155/2015/239654
Ma C-C, Wang Z-L, Xu T, He Z-Y, Wei Y-Q (2020) The approved gene therapy drugs worldwide: from 1998 to 2019. Biotechnol Adv 40:107502. https://doi.org/10.1016/j.biotechadv.2019.107502
Himič V, Davies KE (2021) Evaluating the potential of novel genetic approaches for the treatment of Duchenne muscular dystrophy. Eur J Hum Genet 29(9):1369–1376. https://doi.org/10.1038/s41431-021-00811-2
Kupatt C, Windisch A, Moretti A, Wolf E, Wurst W, Walter MC (2021) Genome editing for Duchenne muscular dystrophy: a glimpse of the future? Gene Ther 28(9):542–548. https://doi.org/10.1038/s41434-021-00222-4
Sun C, Shen L, Zhang Z, **e X (2020) Therapeutic strategies for Duchenne muscular dystrophy: an update. Genes (Basel) 11(8):837. https://doi.org/10.3390/genes11080837
Ferrero E, Dunham I, Sanseau P (2017) In silico prediction of novel therapeutic targets using gene–disease association data. J Transl Med 15(1):182. https://doi.org/10.1186/s12967-017-1285-6
Lin Y, Mehta S, Küçük-McGinty H, Turner JP, Vidovic D, Forlin M et al (2017) Drug target ontology to classify and integrate drug discovery data. J Biomed Semantics 8(1):50. https://doi.org/10.1186/s13326-017-0161-x
Santos R, Ursu O, Gaulton A, Bento AP, Donadi RS, Bologa CG et al (2017) A comprehensive map of molecular drug targets. Nat Rev Drug Discov 16(1):19–34. https://doi.org/10.1038/nrd.2016.230
Overington JP, Al-Lazikani B, Hopkins AL (2006) How many drug targets are there? Nat Rev Drug Discov 5(12):993–996. https://doi.org/10.1038/nrd2199
Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G et al (2019) Applications of machine learning in drug discovery and development. Nat Rev Drug Discov 18(6):463–477. https://doi.org/10.1038/s41573-019-0024-5
Patel L, Shukla T, Huang X, Ussery DW, Wang S (2020) Machine learning methods in drug discovery. Molecules 25(22):5277. https://doi.org/10.3390/molecules25225277
Zheng S, Dharssi S, Wu M, Li J, Lu Z (2019) Text mining for drug discovery. Methods Mol Biol 1939:231–252. https://doi.org/10.1007/978-1-4939-9089-4_13
Cheng T, Hao M, Takeda T, Bryant SH, Wang Y (2017) Large-scale prediction of drug-target interaction: a data-centric review. AAPS J 19(5):1264–1275. https://doi.org/10.1208/s12248-017-0092-6
Opap K, Mulder N (2017) Recent advances in predicting gene-disease associations. F1000Res 6:578. https://doi.org/10.12688/f1000research.10788.1
Papanikolaou N, Pavlopoulos GA, Theodosiou T, Vizirianakis IS, Iliopoulos I (2016) DrugQuest - a text mining workflow for drug association discovery. BMC Bioinformatics 17(Suppl 5):182. https://doi.org/10.1186/s12859-016-1041-6
Rodriguez-Esteban R, Bundschus M (2016) Text mining patents for biomedical knowledge. Drug Discov Today 21(6):997–1002. https://doi.org/10.1016/j.drudis.2016.05.002
Kafkas Ş, Dunham I, McEntyre J (2017) Literature evidence in open targets – a target validation platform. bioRxiv. 124719. https://doi.org/10.1101/124719
Schriml LM, Mitraka E (2015) The disease ontology: fostering interoperability between biological and clinical human disease-related data. Mamm Genome 26(9):584–589. https://doi.org/10.1007/s00335-015-9576-9
Gremse M, Chang A, Schomburg I, Grote A, Scheer M, Ebeling C et al (2011) The BRENDA tissue ontology (BTO): the first all-integrating ontology of all organisms for enzyme sources. Nucleic Acids Res 39(Database issue):D507–DD13. https://doi.org/10.1093/nar/gkq968
Natale DA, Arighi CN, Blake JA, Bona J, Chen C, Chen S-C et al (2016) Protein ontology (PRO): enhancing and scaling up the representation of protein entities. Nucleic Acids Res 45(D1):D339–DD46. https://doi.org/10.1093/nar/gkw1075
Yang Y, Adelstein SJ, Kassis AI (2009) Target discovery from data mining approaches. Drug Discov Today 14(3):147–154. https://doi.org/10.1016/j.drudis.2008.12.005
Rodriguez-Esteban R, Jiang X (2017) Differential gene expression in disease: a comparison between high-throughput studies and the literature. BMC Med Genet 10(1):59. https://doi.org/10.1186/s12920-017-0293-y
Wang T, Li B, Nelson CE, Nabavi S (2019) Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data. BMC Bioinformatics 20(1):40. https://doi.org/10.1186/s12859-019-2599-6
Marco-Puche G, Lois S, BenĂtez J, Trivino JC (2019) RNA-Seq perspectives to improve clinical diagnosis. Front Genet 10:1152. https://doi.org/10.3389/fgene.2019.01152
Gambardella G, di Bernardo D (2019) A tool for visualization and analysis of single-cell RNA-Seq data based on text mining. Front Genet 10:734. https://doi.org/10.3389/fgene.2019.00734
Chiesa M, Colombo GI, Piacentini L (2017) DaMiRseq—an R/Bioconductor package for data mining of RNA-Seq data: normalization, feature selection and classification. Bioinformatics 34(8):1416–1418. https://doi.org/10.1093/bioinformatics/btx795
Gonorazky H, Liang M, Cummings B, Lek M, Micallef J, Hawkins C et al (2015) RNAseq analysis for the diagnosis of muscular dystrophy. Ann Clin Transl Neurol 3(1):55–60. https://doi.org/10.1002/acn3.267
Jiang Z, Shi Y, Tan G, Wang Z (2021) Computational screening of potential glioma-related genes and drugs based on analysis of GEO dataset and text mining. PLoS One 16(2):e0247612. https://doi.org/10.1371/journal.pone.0247612
Bian Y, Yang L, Zhao M, Li Z, Xu Y, Zhou G et al (2019) Identification of key genes and pathways in post-traumatic stress disorder using microarray analysis. Front Psychol 10:302. https://doi.org/10.3389/fpsyg.2019.00302
Mi H, Muruganujan A, Casagrande JT, Thomas PD (2013) Large-scale gene function analysis with the PANTHER classification system. Nat Protoc 8(8):1551–1566. https://doi.org/10.1038/nprot.2013.092
Baran J, Gerner M, Haeussler M, Nenadic G, Bergman CM (2011) pubmed2ensembl: a resource for mining the biological literature on genes. PLoS One 6(9):e24716-e. https://doi.org/10.1371/journal.pone.0024716
Maglott D, Ostell J, Pruitt KD, Tatusova T (2005) Entrez gene: gene-centered information at NCBI. Nucleic Acids Res 33(Database issue):D54–DD8. https://doi.org/10.1093/nar/gki031
Davis AP, Grondin CJ, Johnson RJ, Sciaky D, Wiegers J, Wiegers TC et al (2020) Comparative Toxicogenomics database (CTD): update 2021. Nucleic Acids Res 49(D1):D1138–D1D43. https://doi.org/10.1093/nar/gkaa891
Wiegers TC, Davis AP, Cohen KB, Hirschman L, Mattingly CJ (2009) Text mining and manual curation of chemical-gene-disease networks for the comparative toxicogenomics database (CTD). BMC Bioinformatics 10:326. https://doi.org/10.1186/1471-2105-10-326
Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S et al (2021) The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res 49(D1):D605–DD12. https://doi.org/10.1093/nar/gkaa1074
McCray AT, Burgun A, Bodenreider O (2001) Aggregating UMLS semantic types for reducing conceptual complexity. Stud Health Technol Inform 84(Pt 1):216–220
Bodenreider O, McCray AT (2003) Exploring semantic groups through visual approaches. J Biomed Inform 36(6):414–432. https://doi.org/10.1016/j.jbi.2003.11.002
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Manoharan, S., Iyyappan, O.R. (2022). A Hybrid Protocol for Finding Novel Gene Targets for Various Diseases Using Microarray Expression Data Analysis and Text Mining. In: Raja, K. (eds) Biomedical Text Mining. Methods in Molecular Biology, vol 2496. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2305-3_3
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2305-3_3
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2304-6
Online ISBN: 978-1-0716-2305-3
eBook Packages: Springer Protocols