Artificial Intelligence and Machine Learning in Bioinformatics

  • Chapter
  • First Online:
Advances in Bioinformatics

Abstract

Artificial intelligence (AI) and machine learning (ML) have emerged over the past decade as the cutting-edge technologies most expected to revolutionize the research and development sector. This is fueled in part by game-changing developments in computer technology and the concomitant evaporation of barriers to collecting massive amounts of data. Meanwhile, the cost of researching, testing, manufacturing, and distributing new pharmaceuticals has risen. In light of these challenges, the pharmaceutical industry is interested in AI/ML methods because to their automation, predictability, and the ensuing anticipated boost in efficiency. The use of ML techniques in the pharmaceutical industry has matured during the past 15 years. Clinical trial design, management, and analysis are the most recent drug development process steps to benefit from AI and ML. As we move toward a world in which AI/ML is increasingly integrated into R&D, it is essential to sort through the corresponding jargon and hype. Equally crucial is the understanding that the scientific method is still relevant for drawing conclusions from evidence. By doing so, we can better evaluate the potential benefits of AI/ML in the pharmaceutical industry and make well-informed decisions on their best application. The purpose of this paper is to clarify certain fundamental ideas, provide some examples of their application, and then provide some helpful perspective on how to best apply AI/ML techniques to research and development.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Alpaydin E (2020) Introduction to machine learning, 4th ed. p 1–3, 13–18

    Google Scholar 

  • Athreya AP, Gaglio AJ, Cairns J, Kalari KR, Weinshilboum RM, Wang L, Kalbarczyk ZT, Iyer RK (2018) Machine learning helps identify new drug mechanisms in triple-negative breast cancer. IEEE Trans Nanobioscience 17(3):251–259. https://doi.org/10.1109/TNB.2018.2851997

    Article  Google Scholar 

  • Berrar D (2018) Bayes’ theorem and naive Bayes classifier. In: Encyclopedia of bioinformatics and computational biology: ABC of bioinformatics. Elsevier, pp 403–412

    Google Scholar 

  • Bouckaert RR, Frank E, Hall MA, Holmes G, Pfahringer B, Reutemann P, Witten IH (2010) WEKA—experiences with a Java open-source project. J Mach Learn Res 11:2533–2541

    Google Scholar 

  • Burbidge R, Trotter M, Buxton B, Holden S (2001) Drug design by machine learning: support vector machines for pharmaceutical data analysis. Comput Chem 26(1):5–14

    Article  CAS  Google Scholar 

  • Celebi ME (ed) (2014) Partitional clustering algorithms. Springer

    Google Scholar 

  • Charbuty B, Abdulazeez A (2021) Classification based on decision tree algorithm for machine learning. J Appl Sci Technol Trends 2(01):20–28

    Article  Google Scholar 

  • Chen XW, Gao JX (2016) Big data bioinformatics. Methods 111:1–2. https://doi.org/10.1016/j.ymeth

    Article  Google Scholar 

  • Chetty M, Hallinan J, Ruz GA, Wipat A (2022) Computational intelligence and machine learning in bioinformatics and computational biology. Biosystems 222:104792. https://doi.org/10.1016/j.biosystems.2022.104792

    Article  CAS  Google Scholar 

  • Contreras P, Murtagh F (2015) Hierarchical clustering. In: Handbook of cluster analysis, pp 103–123

    Google Scholar 

  • Demšar J, Curk T, Erjavec A, Gorup Č, Hočevar T, Milutinovič M et al (2013) Orange: data mining toolbox in python. J Mach Learn Res 14(1):2349–2353

    Google Scholar 

  • Ekins R, Chu FW (1999) Microarrays: their origins and applications. Trends Biotechnol 17(6):217–218

    Article  CAS  Google Scholar 

  • Erickson BJ (2021) Basic artificial intelligence techniques: machine learning and deep learning. Radiol Clin N Am 59(6):933–940. https://doi.org/10.1016/j.rcl.2021.06.004

    Article  Google Scholar 

  • Esposito S, Carputo D, Cardi T, Tripodi P (2019) Applications and trends of machine learning in genomics and phenomics for next-generation breeding. Plan Theory 9(1):34

    Google Scholar 

  • Frank E, Hall M, Holmes G, Kirkby R, Pfahringer B, Witten IH, Trigg L (2010) Weka-A machine learning workbench for data mining. In: Data mining and knowledge discovery handbook, pp 1269–1277

    Google Scholar 

  • Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5(10):1–16

    Article  Google Scholar 

  • Goudbeek M, Swingley D, Smits R (2009) Supervised and unsupervised learning of multidimensional acoustic categories. J Exp Psychol Hum Percept Perform 35(6):1913–1933. https://doi.org/10.1037/a0015781

    Article  Google Scholar 

  • Gupta R, Srivastava D, Sahu M, Tiwari S, Ambasta RK, Kumar P (2021) Artificial intelligence to deep learning: machine intelligence approach for drug discovery. Mol Divers 25(3):1315–1360. https://doi.org/10.1007/s11030-021-10217-3

    Article  CAS  Google Scholar 

  • Hofmann M, Klinkenberg R (eds) (2016) RapidMiner: data mining use cases and business analytics applications. CRC Press

    Google Scholar 

  • Karim MR, Beyan O, Zappa A, Costa IG, Rebholz-Schuhmann D, Cochez M, Decker S (2021) Deep learning-based clustering approaches for bioinformatics. Brief Bioinform 22(1):393–415. https://doi.org/10.1093/bib/bbz170

    Article  Google Scholar 

  • Kelchtermans P, Bittremieux W, De Grave K, Degroeve S, Ramon J, Laukens K et al (2014) Machine learning applications in proteomics research: how the past can boost the future. Proteomics 14(4–5):353–366

    Article  CAS  Google Scholar 

  • Kotu V, Deshpande B (2014) Predictive analytics and data mining: concepts and practice with rapidminer. Morgan Kaufmann

    Google Scholar 

  • Krohannon A, Srivastava M, Rauch S, Srivastava R, Dickinson BC, Janga SC, Sowary CA (2022) CRISPR-Cas13 guide RNA predictor for transcript depletion. BMC Genomics 23(1):172. https://doi.org/10.1186/s12864-022-08366-2

    Article  CAS  Google Scholar 

  • Langley P (2011) The changing science of machine learning. Mach Learn 82(3):275–279. https://doi.org/10.1007/s10994-011-5242-y

    Article  Google Scholar 

  • Le NQK, Do DT, Hung TNK, Lam LHT, Huynh TT, Nguyen NTK (2020) A computational framework based on ensemble deep neural networks for essential genes identification. Int J Mol Sci 21:9070. https://doi.org/10.3390/ijms21239070

    Article  CAS  Google Scholar 

  • Li R, Li L, Xu Y, Yang J (2022) Machine learning meets omics: applications and perspectives. Brief Bioinform 23(1):bbab460

    Article  Google Scholar 

  • Libbrecht MW, Noble WS (2015) Machine learning applications in genetics and genomics. Nat Rev Genet 16(6):321–332

    Article  CAS  Google Scholar 

  • Liu C, Che D, Liu X, Song Y (2013) Applications of machine learning in genomics and systems biology. Comput Math Methods Med 2013:587492

    Article  Google Scholar 

  • Liu Y, Qiao N, Altinel Y (2021) Reinforcement learning in Neurocritical and neurosurgical care: principles and possible applications. Comput Math Methods Med 6657119:1. https://doi.org/10.1155/2021/6657119

    Article  Google Scholar 

  • Liu L, Zhai W, Wang F, Yu L, Zhou F, **ang Y, Huang S, Zheng C, Yuan Z, He Y, Yu Z, Ji J (2022) Using machine learning to identify gene interaction networks associated with breast cancer. BMC Cancer 22(1):1070. https://doi.org/10.1186/s12885-022-10170-w

    Article  CAS  Google Scholar 

  • Medin DL, Schaffer MM (1978) Context theory of classification learning. Psychol Rev 85(3):207–238. https://doi.org/10.1037/0033-295X.85.3.207

    Article  Google Scholar 

  • Meyer D, Wien FT (2001) Support vector machines. R News 1(3):23–26

    Google Scholar 

  • Min S, Lee B, Yoon S (2017) Deep learning in bioinformatics. Brief Bioinform 18:851–869. https://doi.org/10.1093/bib/bbw068

    Article  Google Scholar 

  • Mohsen Y-N, Earl H, Dan T, John S, Milad E (2021) Application of machine learning algorithms in plant breeding: predicting yield from hyperspectral reflectance in soybean? Front Plant Sci 11:624273. https://doi.org/10.3389/fpls.2020.624273

    Article  Google Scholar 

  • Mou M, Pan Z, Lu M, Sun H, Wang Y, Luo Y, Zhu F (2022) Application of machine learning in spatial proteomics. J Chem Inf Model 62(23):5875–5895

    Article  CAS  Google Scholar 

  • Muggleton SH (2005) Machine learning for systems biology. In: International conference on inductive logic programming. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 416–423

    Chapter  Google Scholar 

  • Munjal NK, Fleischman AD, Coller RJ (2023) Machine learning, predicting future hospitalizations, and the importance of perception. Hosp Pediatr 13(5):e114–e116. https://doi.org/10.1542/hpeds.2023-007224

    Article  Google Scholar 

  • Navada A, Ansari AN, Patil S, Sonkamble BA (2011) Overview of use of decision tree algorithms in machine learning. In IEEE control and system graduate research colloquium. IEEE. p 37–42

    Google Scholar 

  • Ngiam KY, Khor IW (2019) Big data and machine learning algorithms for health-care delivery. Lancet Oncol 20(5):e262–e273. https://doi.org/10.1016/S1470-2045(19)30149-4. Erratum in: Lancet Oncol. 20(6):293

    Article  Google Scholar 

  • Nosi V, Luca A, Milan M, Arigoni M, Benvenuti S, Cacchiarelli D, Cesana M, Riccardo S, Di Filippo L, Cordero F et al (2021) MET exon 14 skip**: a case study for the detection of genetic variants in cancer driver genes by deep learning. Int J Mol Sci 22:4217. https://doi.org/10.3390/ijms22084217

    Article  CAS  Google Scholar 

  • Nunez-Iglesias J, Kennedy R, Parag T, Shi J, Chklovskii DB (2013) Machine learning of hierarchical clustering to segment 2D and 3D images. PLoS One 8(8):e71715

    Article  CAS  Google Scholar 

  • Perakakis N, Yazdani A, Karniadakis GE, Mantzoros C (2018) Omics, big data and machine learning as tools to propel understanding of biological mechanisms and to discover novel diagnostics and therapeutics. Metabolism 87:A1–A9

    Article  CAS  Google Scholar 

  • Persson Hoden K, Hu X, Martinez G, Dixelius C (2021) Smart PARE: an R package for efficient identification of true mRNA cleavage sites. Int J Mol Sci 22:4267. https://doi.org/10.3390/ijms22084267

    Article  CAS  Google Scholar 

  • Pirooznia M, Yang JY, Yang MQ, Deng Y (2008) A comparative study of different machine learning methods on microarray gene expression data. BMC Genomics 9:1–13

    Article  Google Scholar 

  • Pushpakom S, Iorio F, Eyers PA, Escott KJ, Hopper S, Wells A, Doig A, Guilliams T, Latimer J, McNamee C, Norris A, Sanseau P, Cavalla D, Pirmohamed M (2019) Drug repurposing: progress, challenges and recommendations. Nat Rev Drug Discov 18(1):41–58. https://doi.org/10.1038/nrd.2018.168

    Article  CAS  Google Scholar 

  • Reimers M, Carey VJ (2006) Bioconductor: an open source framework for bioinformatics and computational biology. Methods Enzymol 411:119–134

    Article  CAS  Google Scholar 

  • Ripley BD (2001) The R project in statistical computing. MSOR Connections. The Newsletter of the LTSN Maths, Stats & OR Network 1(1):23–25

    Google Scholar 

  • Saritas MM, Yasar A (2019) Performance analysis of ANN and Naive Bayes classification algorithm for data classification. Int J Intell Syst Appl Eng 7(2):88–91

    Article  Google Scholar 

  • Sarle Warren S (1994) Neural networks and statistical models. In SUGI 19: proceedings of the nineteenth annual SAS users group international conference. SAS Institute, p 1538–1550. ISBN 9781555446116. OCLC 35546178

    Google Scholar 

  • Sonagara D, Badheka S (2014) Comparison of basic clustering algorithms. Int J Comput Sci Mob Comput 3(10):58–61

    Google Scholar 

  • Stuart R, Peter N (2003) Artificial intelligence: a modern approach, 2nd edn. Prentice Hall. ISBN 978-0137903955

    Google Scholar 

  • Tierney L (2012) The R statistical computing environment. In: Statistical challenges in modern astronomy V. Springer New York, New York, NY, pp 435–447

    Chapter  Google Scholar 

  • Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, Li B, Madabhushi A, Shah P, Spitzer M, Zhao S (2019) Applications of machine learning in drug discovery and development. Nat Rev Drug Discov 18(6):463–477. https://doi.org/10.1038/s41573-019-0024-5

    Article  CAS  Google Scholar 

  • Venkatesh KK, Strauss RA, Grotegut CA, Heine RP, Chescheir NC, Stringer JSA, Stamilio DM, Menard KM, Jelovsek JE (2020) Machine learning and statistical models to predict postpartum hemorrhage. Obstet Gynecol 135(4):935–944. https://doi.org/10.1097/AOG.0000000000003759

    Article  Google Scholar 

  • Wang S, Liu D, Ding M, Du Z, Zhong Y, Song T, Zhu J, Zhao R (2021) SE-onion net: a convolution neural network for protein-ligand binding affinity prediction. Front Genet 11:607824. https://doi.org/10.3389/fgene.2020.607824

    Article  CAS  Google Scholar 

  • Weltz J, Volfovsky A, Laber EB (2022) Reinforcement learning methods in public health. Clin Ther 44(1):139–154. https://doi.org/10.1016/j.clinthera.2021.11.002

    Article  Google Scholar 

  • Yan J, Wang X (2022) Unsupervised and semi-supervised learning: the next frontier in machine learning for plant systems biology. Plant J 111(6):1527–1538. https://doi.org/10.1111/tpj.15905. Epub 2022 Jul 27

    Article  CAS  Google Scholar 

  • Zhou Y, Shi W, Zhao D, **ao S, Wang K, Wang J (2022) Identification of immune-associated genes in diagnosing aortic valve calcification with metabolic syndrome by integrated bioinformatics analysis and machine learning. Front Immunol 13:937886. https://doi.org/10.3389/fimmu.2022.937886

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Alam, S., Israr, J., Kumar, A. (2024). Artificial Intelligence and Machine Learning in Bioinformatics. In: Singh, V., Kumar, A. (eds) Advances in Bioinformatics. Springer, Singapore. https://doi.org/10.1007/978-981-99-8401-5_16

Download citation

Publish with us

Policies and ethics

Navigation