Abstract
Cancer presents a formidable challenge in medical research, spurring efforts to demystify its underlying mechanisms towards advancing precision medicine, which aims at tailoring treatments to individuals’ genetic profiles. This study harnesses the power of single-cell RNA sequencing (scRNA-seq), a cutting-edge tool in next-generation sequencing, to delve into the transcriptomic intricacies of individual cells across diverse populations. Our methodology provides profound insights into gene expression patterns, significantly enhancing our understanding of cellular heterogeneity and its implications for cancer’s pathogenesis. To address the ’curse of dimensionality’ inherent in high-dimensional data, we introduce a sophisticated machine learning-based feature selection approach. This technique conceptualizes gene selection as a multi-label classification challenge, focusing on identifying genes critical for distinguishing between disease states and cell types. Importantly, our strategy underscores the value of data integration in reinforcing the statistical robustness of scRNA-seq analyses. By integrating disparate scRNA-seq datasets, we effectively mitigate batch effects, ensuring more accurate and reliable insights, thereby contributing significantly to the advancement of precision medicine in oncology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Berisha, V., Krantsevich, C., Hahn, P.R., Hahn, S., Dasarathy, G., Turaga, P., Liss, J.: Digital medicine and the curse of dimensionality. NPJ Digital Med. 4(1), 153 (2021)
Butler, A., Hoffman, P., Smibert, P., Papalexi, E., Satija, R.: Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36(5), 411–420 (2018)
Büttner, M., Miao, Z., Wolf, F.A., Teichmann, S.A., Theis, F.J.: A test metric for assessing single-cell rna-seq batch correction. Nat. Methods 16(1), 43–49 (2019)
Camps, J., Noël, F., Liechti, R., Massenet-Regad, L., Rigade, S., Götz, L., Hoffmann, C., Amblard, E., Saichi, M., Ibrahim, M.M., et al.: Meta-analysis of human cancer single-cell rna-seq datasets using the immucan database. Can. Res. 83(3), 363–373 (2023)
de Carvalho, A.C., Freitas, A.A.: A tutorial on multi-label classification techniques. Foundations of Computational Intelligence Volume 5: Function Approximation and Classification, pp. 177–195 (2009)
Chatzilygeroudis, K.I., Vrahatis, A.G., Tasoulis, S.K., Vrahatis, M.N.: Feature selection in single-cell rna-seq data via a genetic algorithm. In: Learning and Intelligent Optimization: 15th International Conference, LION 15, Athens, Greece, June 20–25, 2021, Revised Selected Papers 15, pp. 66–79. Springer (2021)
Choi, Y.H., Kim, J.K.: Dissecting cellular heterogeneity using single-cell rna sequencing. Mol. Cells 42(3), 189–199 (2019)
Géron, A.: Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. O’Reilly Media, Inc. (2022)
Han, H., Guo, X., Yu, H.: Variable selection using mean decrease accuracy and mean decrease gini based on random forest. In: 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS), pp. 219–224. IEEE (2016)
Hie, B., Bryson, B., Berger, B.: Efficient integration of heterogeneous single-cell transcriptomes using scanorama. Nat. Biotechnol. 37(6), 685–691 (2019)
Kharchenko, P.V.: The triumphs and limitations of computational methods for scrna-seq. Nat. Methods 18(7), 723–732 (2021)
Kolodziejczyk, A.A., Kim, J.K., Svensson, V., Marioni, J.C., Teichmann, S.A.: The technology and biology of single-cell rna sequencing. Mol. Cell 58(4), 610–620 (2015)
Korsunsky, I., et al.: Fast, sensitive and accurate integration of single-cell data with harmony. Nature Methods 16(12), 1289–1296 (2019)
Kuhn, M., Johnson, K., et al.: Applied Predictive Modeling, vol. 26. Springer (2013)
Lazaros, K., Tasoulis, S., Vrahatis, A., Plagianakos, V.: Feature selection for high dimensional data using supervised machine learning techniques. In: 2022 IEEE International Conference on Big Data (Big Data), pp. 3891–3894. IEEE (2022)
Lopez, R., Regier, J., Cole, M.B., Jordan, M.I., Yosef, N.: Deep generative modeling for single-cell transcriptomics. Nat. Methods 15(12), 1053–1058 (2018)
Luecken, M.D., Büttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Müller, M.F., Strobl, D.C., Zappia, L., Dugas, M., Colomé-Tatché, M., et al.: Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19(1), 41–50 (2022)
Luecken, M.D., Theis, F.J.: Current best practices in single-cell rna-seq analysis: a tutorial. Mol. Syst. Biol. 15(6), e8746 (2019)
McInnes, L., Healy, J., Melville, J.: Umap: Uniform manifold approximation and projection for dimension reduction. ar**v preprint ar**v:1802.03426 (2018)
Pal, M.: Random forest classifier for remote sensing classification. Int. J. Remote Sens. 26(1), 217–222 (2005)
Paplomatas, P., Krokidis, M.G., Vlamos, P., Vrahatis, A.G.: An ensemble feature selection approach for analysis and modeling of transcriptome data in Alzheimer’s disease. Appl. Sci. 13(4), 2353 (2023)
Saliba, A.E., Westermann, A.J., Gorski, S.A., Vogel, J.: Single-cell rna-seq: advances and future challenges. Nucleic Acids Res. 42(14), 8845–8860 (2014)
Wolf, F.A., Angerer, P., Theis, F.J.: Scanpy: large-scale single-cell gene expression data analysis. Genome Biol. 19, 1–5 (2018)
Xu, C., Lopez, R., Mehlman, E., Regier, J., Jordan, M.I., Yosef, N.: Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models. Mol. Syst. Biol. 17(1), e9620 (2021)
Acknowledgement
The registration and publication costs for this work are funded by the Research Committee of the Ionian University, Special Account for Research Grants, project title: “Master’s program in Bioinformatics and Neuroinformatics”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 IFIP International Federation for Information Processing
About this paper
Cite this paper
Lazaros, K., Exarchos, T., Maglogiannis, I., Vlamos, P., Vrahatis, A.G. (2024). Advancing ScRNA-Seq Data Integration via a Novel Gene Selection Method. In: Maglogiannis, I., Iliadis, L., Macintyre, J., Avlonitis, M., Papaleonidas, A. (eds) Artificial Intelligence Applications and Innovations. AIAI 2024. IFIP Advances in Information and Communication Technology, vol 711. Springer, Cham. https://doi.org/10.1007/978-3-031-63211-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-63211-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-63210-5
Online ISBN: 978-3-031-63211-2
eBook Packages: Computer ScienceComputer Science (R0)