Abstract
With recent advances in precision medicine and healthcare computing, there is an enormous demand for develo** machine learning algorithms in genomics to enhance the rapid analysis of disease disorders. Technological advancement in genomics and imaging provides clinicians with enormous amounts of data, but prediction is still mostly subjective, resulting in problematic medical treatment. Machine learning is being employed in several domains of the healthcare sector, encompassing clinical research, early disease identification, and medicinal innovation with a historical perspective. The main objective of this study is to detect patients who, based on several medical standards, are more susceptible to having a genetic disorder. A genetic disease prediction algorithm was employed, leveraging the patient’s health history to evaluate the probability of diagnosing a genetic disorder. We developed a computationally efficient machine learning approach to predict the overall lifespan of patients with a genomics disorder and to classify and predict patients with a genetic disease. The SVM, RF, and ETC are stacked using two-layer meta-estimators to develop the proposed model. The first layer comprises all the baseline models employed to predict the outcomes based on the dataset. The second layer comprises a component known as a meta-classifier. Results from the experiment indicate that the model achieved an accuracy of 90.45% and a recall score of 90.19%. The area under the curve (AUC) for mitochondrial diseases is 98.1%; for multifactorial diseases, it is 97.5%; and for single-gene inheritance, it is 98.8%. The proposed approach presents a novel method for predicting patient prognosis in a manner that is unbiased, accurate, and comprehensive. The proposed approach outperforms human professionals using the current clinical standard for genetic disease classification in terms of identification accuracy. The implementation of stacked will significantly improve the field of biomedical research by improving the anticipation of genetic diseases.
Similar content being viewed by others
Availability of data and material
The dataset we used in our study is freely available publicly on the Kaggle website: (https://www.kaggle.com/datasets/aryarishabh/of-genomes-and-genetics-hackerearth-ml-challenge).
References
Quazi S (2022) Artificial intelligence and machine learning in precision and genomic medicine. Med Oncol 39(8):120
Lenze EJ, Rodebaugh TL, Nicol GE (2020) A framework for advancing precision medicine in clinical trials for mental disorders. JAMA Psychiat 77(7):663–664
Le-Niculescu H, Roseberry K, Levey DF, Rogers J, Kosary K, Prabha S, Jones T, Judd S, McCormick MA, Wessel AR, Williams A (2020) Towards precision medicine for stress disorders: diagnostic biomarkers and targeted drugs. Mol Psychiatry 25(5):918–938
Ghazal TM, Al Hamadi H, Umar Nasir M, Gollapalli M, Zubair M, Adnan Khan M, Yeob Yeun C (2022) Supervised machine learning empowered multifactorial genetic inheritance disorder prediction. Comput Intell Neurosci 2022
De La Vega FM, Chowdhury S, Moore B, Frise E, McCarthy J, Hernandez EJ, Wong T, James K, Guidugli L, Agrawal PB, Genetti CA (2021) Artificial intelligence enables comprehensive genome interpretation and nomination of candidate diagnoses for rare genetic diseases. Genome Med 13:1–19
Thirunavukarasu R, Gnanasambandan R, Gopikrishnan M, Palanisamy V (2022) Towards computational solutions for precision medicine based big data healthcare system using deep learning models: a review. Comput Biol Med 106020
Martin-Sanchez F, Iakovidis I, Nørager S, Maojo V, de Groen P, Van der Lei J, Jones T, Abraham-Fuchs K, Apweiler R, Babic A, Baud R (2004) Synergy between medical informatics and bioinformatics: facilitating genomic medicine for future health care. J Biomed Inform 37(1):30–42
Nandhini K, Tamilpavai G (2023) An optimal stacked ResNet-BiLSTM-based accurate detection and classification of genetic disorders. Neural Process Lett 1–22
Nasir MU, Khan MA, Muhammad Z, Ghazal TM, Said RA, Al Hamadi H (2022) Single and mitochondrial gene inheritance disorder prediction using machine learning. Comput Mater Contin 73:953–963
Ghazal TM, Al Hamadi H, Nasir MU, Gollapalli M, Zubair M, Khan MA, Yeun CY (2022) Supervised machine learning empowered multifactorial genetic inheritance disorder prediction. Comput Intell Neurosci 2022
Solomon DD, Sonia, Kumar K, Kanwar K, Iyer S, Kumar M (2023) Extensive review on the role of machine learning for multifactorial genetic disorders prediction. Arch Comput Meth Eng 1–18
Gurovich Y, Hanani Y, Bar O, Nadav G, Fleischer N, Gelbman D, Basel-Salmon L, Krawitz PM, Kamphausen SB, Zenker M, Bird LM (2019) Identifying facial phenotypes of genetic disorders using deep learning. Nat Med 25(1):60–64
Lin E, Lane H-Y (2017) Machine learning and systems genomics approaches for multi-omics data. Biomarker Res 5:1–6
Asgari E, Mofrad MRK (2015) Continuous distributed representation of biological sequences for deep proteomics and genomics. PloS one 10(11):e0141287. https://doi.org/10.1371/journal.pone.0141287
Mobadersany P, Yousefi S, Amgad M, Gutman DA, Barnholtz-Sloan JS, Velázquez Vega JE, Brat DJ, Cooper LAD (2018) Predicting cancer outcomes from histology and genomics using convolutional networks. Proc Natl Acad Sci 115(13):E2970–E2979
Rostoks N, Park YJ, Ramakrishna W, Ma J, Druka A, Shiloff BA, SanMiguel PJ, Jiang Z, Brueggeman R, Sandhu D, Gill K (2002) Genomic sequencing reveals gene content, genomic organization, and recombination relationships in barley. Funct Integr Genomics 2:51–59
Smoller JW (2018) The use of electronic health records for psychiatric phenoty** and genomics. Am J Med Genet B Neuropsychiatr Genet 177(7):601–612
Liu L, Qingxian F, Ding H, Jiang H, Zhan Z, Lai Y (2023) Combination of machine learning-based bulk and single-cell genomics reveals necroptosis-related molecular subtypes and immunological features in autism spectrum disorder. Front Immunol 14:1139420
DeGroat W, Venkat V, Pierre-Louis W, Abdelhalim H, Ahmed Z (2023) Hygieia: AI/ML pipeline integrating healthcare and genomics data to investigate genes associated with targeted disorders and predict disease. Software Impacts 16:100493
Guo K, Wu M, Soo Z, Yang Y, Zhang Y, Zhang Q, Lin H, Grosser M, Venter D, Zhang G, Lu J (2023) Artificial intelligence-driven biomedical genomics. Knowl-Based Syst 7:110937
Allesøe RL, Thompson WK, Bybjerg-Grauholm J, Hougaard DM, Nordentoft M, Werge T, Rasmussen S, Benros ME (2023) Deep learning for cross-diagnostic prediction of mental disorder diagnosis and prognosis using Danish nationwide register and genetic data. JAMA Psychiatry 80(2):146–155
Bracher-Smith M, Crawford K, Escott-Price V (2021) Machine learning for genetic prediction of psychiatric disorders: a systematic review. Mol Psychiatry 26(1):70–79
Mittag F, Büchel F, Saad M, Jahn A, Schulte C, Bochdanovits Z, Simón-Sánchez J et al (2012) Use of support vector machines for disease risk prediction in genome-wide association studies: concerns and opportunities. Hum Mutat 33(12):1708–1718
Rodriguez-Galiano V, Sanchez-Castillo M, Chica-Olmo M, Chica-Rivas MJOGR (2015) Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol Rev 71:804–818
Alchamlat A, Sinan, Farnir F (2017) KNN-MDR: a learning approach for improving interactions map** performances in genome wide association studies. BMC Bioinf 18:1–12
Haga H, Sato H, Koseki A, Saito T, Okumoto K, Hoshikawa K, Katsumi T, Mizuno K, Nishina T, Ueno Y (2020) A machine learning-based treatment prediction model using whole genome variants of hepatitis C virus. PLoS ONE 15(11):e0242028
Kuang X, Wang F, Hernandez KM, Zhang Z, Grossman RL (2022) Accurate and rapid prediction of tuberculosis drug resistance from genome sequence data using traditional machine learning algorithms and CNN. Sci Rep 12(1):2427
Rish I (2001) An empirical study of the naive Bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, vol 3, no 22. pp 41–46
Shafique R, Mehmood A, Choi GS (2019) Cardiovascular disease prediction system using extra trees classifier
Yang K, Zheng Y, Kezhi L, Chang K, Wang N, Shu Z, Jian Yu, Liu B, Gao Z, Zhou X (2020) PDGNet: predicting disease genes using a deep neural network with multi-view features. IEEE/ACM Trans Comput Biol Bioinf 19(1):575–584
Farran B, Channanath AM, Behbehani K, Thanaraj TA (2013) Predictive models to assess risk of type 2 diabetes, hypertension and comorbidity: machine-learning algorithms and validation using national health data from Kuwait-a cohort study. BMJ Open 3(5):e002457
Liu L, Fu Q, Ding H, Jiang H, Zhan Z, Lai Y (2023) Combination of machine learning-based bulk and single-cell genomics reveals necroptosis-related molecular subtypes and immunological features in autism spectrum disorder. Front Immunol 14:1139420. https://doi.org/10.3389/fimmu.2023.1139420
Nasir MU, Gollapalli M, Zubair M, Saleem MA, Mehmood S, Khan MA, Mosavi A (2022) Advance genome disorder prediction model empowered with deep learning. IEEE Access 10:70317–70328. https://doi.org/10.1109/ACCESS.2022.3186998
González-Recio O, Forni S (2011) Genome-wide prediction of discrete traits using Bayesian regressions and machine learning. Genet Sel Evol 43:1–12. https://doi.org/10.1186/1297-9686-43-7
Funding
The authors are thankful to AIDA Lab CCIS Prince Sultan University, Riyadh, Saudi Arabia, for the support.
Author information
Authors and Affiliations
Contributions
A.R.: conceptualization, methodology. M.M.: software programming, validation, verification. T.S.: formal analysis, investigation. G.J.: resources, data curation, management.
Corresponding author
Ethics declarations
Ethical approval
Not applicable
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rehman, A., Mujahid, M., Saba, T. et al. Optimised stacked machine learning algorithms for genomics and genetics disorder detection in the healthcare industry. Funct Integr Genomics 24, 23 (2024). https://doi.org/10.1007/s10142-024-01289-z
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10142-024-01289-z