Abstract
Single-cell RNA sequencing (scRNA-seq) data serves as the foundation for many studies investigating cellular heterogeneity. Numerous methodologies and evaluation metrics within single-cell research are intertwined with cell labels. While annotating cell labels often requires prior biological knowledge for clustering, this is frequently approached from a clustering perspective rather than considering the heterogeneity of individual cells. Building upon this, we introduce a data-driven self-correction framework for labeled scRNA-seq data, termed scCoRR. This framework utilizes a supervised approach trained from partially reliable anchor cells, eliminating the need for additional prior reference datasets or marker genes. Subsequently, a supervised deep neural network is trained with cross-entropy loss and a contrastive regularization term to predict the types of the remaining cells. During this process, the labels of some cells are corrected from one cell type to another, a phenomenon that can also be elucidated from various biological perspectives.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wang, B., Zhu, J., Pierson, E., Ramazzotti, D., Batzoglou, S.: Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat. Methods 14(4), 414–416 (2017)
Dong, R., Yuan, G.C.: GiniClust3: a fast and memory-efficient tool for rare cell type identification. BMC Bioinform. 21, 1–7 (2020)
Zheng, R., Li, M., Liang, Z., Wu, F.X., Pan, Y., Wang, J.: SinNLRR: a robust subspace clustering method for cell type detection by non-negative and low-rank representation. Bioinformatics 35(19), 3642–3650 (2019)
Chen, Y., Zheng, R., Liu, J., Li, M.: scMLC: an accurate and robust multiplex community detection method for single-cell multi-omics data. Brief. Bioinform. 25(2), bbae101 (2024)
Tian, T., Wan, J., Song, Q., Wei, Z.: Clustering single-cell RNA-seq data with a model-based deep learning approach. Nat. Mach. Intell. 1(4), 191–198 (2019)
Wan, H., Chen, L., Deng, M.: scNAME: neighborhood contrastive clustering with ancillary mask estimation for scRNA-seq data. Bioinformatics 38(6), 1575–1583 (2022)
Ciortan, M., Defrance, M.: GNN-based embedding for clustering scRNA-seq data. Bioinformatics 38(4), 1037–1044 (2022)
Baron, M., et al.: A single-cell transcriptomic map of the human and mouse pancreas reveals inter-and intra-cell population structure. Cell Syst. 3(4), 346–360 (2016)
Aibar, S., et al.: SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14(11), 1083–1086 (2017)
Peterson, L.E.: K-nearest neighbor. Scholarpedia 4(2), 1883 (2009)
Malkov, Y.A., Yashunin, D.A.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans. Pattern Anal. Mach. Intell. 42(4), 824–836 (2018)
McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. ar**v preprint ar**v:1802.03426 (2018)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
De Boer, P.T., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134, 19–67 (2005)
Liu, J., Zeng, W., Kan, S., Li, M., Zheng, R.: CAKE: a flexible self-supervised framework for enhancing cell visualization, clustering and rare cell identification. Brief. Bioinform. 25(1), bbad475 (2024)
Adam, M., Potter, A.S., Potter, S.S.: Psychrophilic proteases dramatically reduce single-cell RNA-seq artifacts: a molecular atlas of kidney development. Development 144(19), 3625–3632 (2017)
Schaum, N., et al.: Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris: The Tabula Muris Consortium. Nature 562(7727), 367 (2018)
Muraro, M.J., et al.: A single-cell transcriptome atlas of the human pancreas. Cell Syst. 3(4), 385–394 (2016)
Tosches, M.A., Yamawaki, T.M., Naumann, R.K., Jacobi, A.A., Tushev, G., Laurent, G.: Evolution of pallium, hippocampus, and cortical cell types revealed by single-cell transcriptomics in reptiles. Science 360(6391), 881–888 (2018)
Young, M.D., et al.: Single-cell transcriptomes from human kidneys reveal the cellular identity of renal tumors. Science 361(6402), 594–599 (2018)
**e, Z., et al.: Gene set knowledge discovery with Enrichr. Curr. Protocols 1(3), e90 (2021)
Uhlig, R., et al.: Carboxypeptidase A1 (CPA1) immunohistochemistry is highly sensitive and specific for Acinar Cell Carcinoma (ACC) of the pancreas. Am. J. Surg. Pathol. 46(1), 97–104 (2022)
Merz, S., et al.: Single-cell profiling of GP2-enriched pancreatic progenitors to simultaneously create acinar, ductal, and endocrine organoids. Theranostics 13(6), 1949 (2023)
Qadir, M.M.F., et al.: Single-cell resolution analysis of the human pancreatic ductal progenitor cell niche. Proc. Natl. Acad. Sci. 117(20), 10876–10887 (2020)
Bydoun, M., et al.: S100A10, a novel biomarker in pancreatic ductal adenocarcinoma. Mol. Oncol. 12(11), 1895–1916 (2018)
Wang, J., et al.: CD52 is a prognostic biomarker and associated with tumor microenvironment in breast cancer. Front. Genet. 11, 578002 (2020)
Smyth, P., Sasiwachirangkul, J., Williams, R., Scott, C.J.: Cathepsin S (CTSS) activity in health and disease-a treasure trove of untapped clinical potential. Mol. Aspects Med. 88, 101106 (2022)
Rasmussen, M., et al.: Stroma-specific gene expression signature identifies prostate cancer subtype with high recurrence risk. NPJ Precis. Oncol. 8(1), 48 (2024)
Wang, J.J., et al.: Single-cell transcriptome dissection of the toxic impact of Di (2-ethylhexyl) phthalate on primordial follicle assembly. Theranostics 11(10), 4992 (2021)
Quah, F.X., Hemberg, M.: SC3s: efficient scaling of single cell consensus clustering to millions of cells. BMC Bioinform. 23(1), 536 (2022)
Chen, L., Wang, W., Zhai, Y., Deng, M.: Deep soft K-means clustering with self-training for single-cell RNA sequence data. NAR Genomics Bioinform. 2(2), lqaa039 (2020)
Han, W., et al.: Self-supervised contrastive learning for integrative single cell RNA-seq data analysis. Brief. Bioinform. 23(5), bbac377 (2022)
Acknowledgement
This work was supported by the National Natural Science Foundation of China under Grant No. 62202503, 62225209, Hunan Provincial Natural Science Foundation of China under Grant No. 2023JJ40780.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
He, Y., Liu, J., Li, M., Zheng, R. (2024). scCoRR: A Data-Driven Self-correction Framework for Labeled scRNA-Seq Data. In: Peng, W., Cai, Z., Skums, P. (eds) Bioinformatics Research and Applications. ISBRA 2024. Lecture Notes in Computer Science(), vol 14955. Springer, Singapore. https://doi.org/10.1007/978-981-97-5131-0_5
Download citation
DOI: https://doi.org/10.1007/978-981-97-5131-0_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5130-3
Online ISBN: 978-981-97-5131-0
eBook Packages: Computer ScienceComputer Science (R0)