Abstract
The Next Generation Sequencing technologies provide large volumes of DNA-seq and RNA-seq data. A central part of their investigation is the task for selecting the differentially expressed genes. Different methods for RNA-seq data analysis that identify genes distinguished by their expression levels have been proposed basically on the statistical data analysis. There is no agreement among the applied methods as different results are produced by the distinct methods. The present paper proposes a new method for differential gene expression analysis based on machine learning approach. Difficulty of the selection due to the large number of indistinguishable genes is solved by iterative clustering procedure. The importance of the proper cluster distance measure is discussed. The visibility of the procedure results and ability to find different number of compact clusters is also underlined. The significance of the method is investigated and proved by application to the two mice strains dataset. The obtained results are compared with the results of the statistical methods applied to the same dataset. It is concluded that the proposed method is valuable and could be applied as standalone or for preliminary genes selection within a statistical algorithms pipeline for discovering differentially expressed genes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Spies, D., Renz, P.F., Beyer, T.A., Ciaudo, C.: Comparative analysis of differential gene expression tools for RNA sequencing time course data. Brief. Bioinform. 20(1), 288–298 (2019)
Wang, T., Li, B., Nelson, C.E., et al.: Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data. BMC Bioinform. 20, 40 (2019)
Palejev, D.: Comparison of RNA-seq differential expression methods. Cybern. Inf. Technol. 17(5), 60–67 (2017)
Law, C.W., Chen, Y., Shi, W., Smyth, G.: Voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15, 1–17 (2014). R29
Anders, S., Huber, W.: Differential expression analysis for sequence count data. Genome Biol. 11 (2010). R106
Love, M.I., Huber, W., Anders, S.: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 12 (2014). 550
Robinson, M.D., Mccarthy, D.J., Smyth, G.K.: EdgeR: A bioconductor package for differential expression analysis of digital gene expression data. Bioinform. 26(1), 139–140 (2010)
Chousiadas, D., Menychtas, A., Tsanakas, P., Maglogiannis, I.: Advancing quantified-self applications utilizing visual data analytics and the internet of things. In: Iliadis, L., Maglogiannis, I., Plagianakos, V. (eds.) AIAI 2018. IAICT, vol. 520, pp. 263–274. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-92016-0_24
Sevakula, R.K., Au-Yeung, W.T.M., Singh, J.P., Heist, E.K., Isselbacher, E.M., Armoundas, A.A.: State-of-the-Art machine learning techniques aiming to improve patient outcomes pertaining to the cardiovascular system. J. Am. Heart Assoc. 9(4), e013924 (2020)
Poddar, M.G., Birajdar, A.C., Virmani, J., Kriti: Automated classification of hypertension and coronary artery disease patients by PNN, KNN, and SVM classifiers using HRV analysis. In: Dey, N., Borra, S., Ashour, A.S., Shi, F. (eds.) Proceedings of the Machine Learning in Bio-Signal Analysis and Diagnostic Imaging, pp. 99–125. Academic Press (2019)
van IJzendoorn, D.G.P., Szuhai, K., Briaire-de Bruijn, I.H., Kostine, M., Kuijjer, M.L., et al.: Machine learning analysis of gene expression data reveals novel diagnostic and prognostic biomarkers and identifies therapeutic targets for soft tissue sarcomas. PLoS Comput. Biol. 15(2) (2019)
Abbas, M., El-Manzalawy, Y.: Machine learning based refined differential gene expression analysis of pediatric sepsis. BMC Med. Genomics 13, 122 (2020)
Bottomly, D., Walter, N.A.R., Hunter, J.E., et al.: Evaluating gene expression in C57BL/6J and DBA/2J mouse striatum using RNA-seq and microarrays. PLoS ONE 6(3), e17820 (2011)
Ester, M., Kriegel, H.-P., Sander, J., **aowei, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery in Databases and Data Mining, pp. 226–231. AAAI Press, Portland (1996)
Frazee, A.C., Langmead, B., Leek, J.T.: ReCount: a multi-experiment resource of analysis-ready RNA-seq gene count datasets. BMC Bioinform. 12, 449 (2011)
Acknowledgement
The result presented in this paper is part of the GATE project. The project has received funding from the European Union’s Horizon 2020 WIDESPREAD-2018–2020 TEAMING Phase 2 programme under Grant Agreement No. 857155 and Operational Programme Science and Education for Smart Growth under Grant Agreement No. BG05M2OP001–1.003–0002-C01.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Georgieva, O. (2022). Iterative Clustering for Differential Gene Expression Analysis. In: Rojas, I., Valenzuela, O., Rojas, F., Herrera, L.J., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2022. Lecture Notes in Computer Science(), vol 13347. Springer, Cham. https://doi.org/10.1007/978-3-031-07802-6_33
Download citation
DOI: https://doi.org/10.1007/978-3-031-07802-6_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-07801-9
Online ISBN: 978-3-031-07802-6
eBook Packages: Computer ScienceComputer Science (R0)