Abstract
Cell type deconvolution is a computational method for the determination/resolution of cell type proportions from bulk sequencing data, and is frequently used for the analysis of divergent cell types in tumour tissue samples. However, deconvolution technology is still in its infancy for the analysis of cell types using proteomic data due to challenges with repeatability/reproducibility, variable reference standards and the lack of single-cell proteomic reference data. Here we develop a deep-learning-based deconvolution method (scpDeconv) specifically designed for proteomic data. scpDeconv uses an autoencoder to leverage the information from bulk proteomic data to improve the quality of single-cell proteomic data, and employs a domain adversarial architecture to bridge the single-cell and bulk data distributions and transfer labels from single-cell data to bulk data. Extensive experiments validate the performance of scpDeconv in the deconvolution of proteomic data produced from various species/sources and different proteomic technologies. This method should find broad applicability to areas including tumour microenvironment interpretation and clinical diagnosis/classification.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1038%2Fs42256-023-00737-y/MediaObjects/42256_2023_737_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1038%2Fs42256-023-00737-y/MediaObjects/42256_2023_737_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1038%2Fs42256-023-00737-y/MediaObjects/42256_2023_737_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1038%2Fs42256-023-00737-y/MediaObjects/42256_2023_737_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1038%2Fs42256-023-00737-y/MediaObjects/42256_2023_737_Fig5_HTML.png)
Similar content being viewed by others
Data availability
All data used in this study are publicly available, and their usage has been comprehensively discussed in the Methods. For the human breast atlas datasets22, the processed scRNA-seq dataset was downloaded from the GEO repository (accession no. GSE180878) and the processed CyTOF dataset from https://data.mendeley.com/datasets/vs8m5gkyfn/1. For the murine cell line datasets, the N2 dataset23 was downloaded from the MassIVE data repository (accession no. MSV000086809) and the nanoPOTS dataset16 from the MassIVE data repository (accession no. MSV000084110). For the human cell line datasets, the T-SCP dataset24 was downloaded from the PRIDE partner repository (accession no. PXD024043), the plexDIA dataset25 was downloaded from https://scp.slavovlab.net/Derks_et_al_2022, the pSCoPE_Leduc dataset26 was downloaded from https://scp.slavovlab.net/Leduc_et_al_2022, the SCoPE2_Leduc dataset26 from https://scp.slavovlab.net/Leduc_et_al_2021 and the pSCoPE_Huffman dataset27 from https://scp.slavovlab.net/Huffman_et_al_2022. The Deep Visual Proteomics (DVP) dataset36 was downloaded from the PRIDE partner repository (accession no. PXD023904). The bulk proteomic data and the clinical information regarding melanoma samples were obtained from the Supplementary Information provided in the original publication35. Source data are provided with this paper.
Code availability
The codes were implemented in Python and have been released on GitHub (https://github.com/TencentAILabHealthcare/scpDeconv) and Zenodo (https://doi.org/10.5281/zenodo.8278210)44 with detailed instructions. The comparison methods are implemented under their official guidance and the reference codes are listed in the Methods.
References
Wang, X., Park, J., Susztak, K., Zhang, N. R. & Li, M. Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nat. Commun. 10, 380 (2019).
Qian, J. et al. A pan-cancer blueprint of the heterogeneous tumor microenvironment revealed by single-cell profiling. Cell Res. 30, 745–762 (2020).
Azizi, E. et al. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell 174, 1293–1308 (2018).
Frishberg, A. et al. Cell composition analysis of bulk genomics using single-cell data. Nat. Methods 16, 327–332 (2019).
Newman, A. M. et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 37, 773–782 (2019).
Menden, K. et al. Deep learning-based cell composition analysis from tissue expression profiles. Sci. Adv. 6, eaba2619 (2020).
Cable, D. M. et al. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat. Biotechnol. 40, 517–526 (2021).
Edwards, N. J. et al. The CPTAC data portal: a resource for cancer proteomics research. J. Proteome Res. 14, 2707–2713 (2015).
Marguerat, S. et al. Quantitative analysis of fission yeast transcriptomes and proteomes in proliferating and quiescent cells. Cell 151, 671–683 (2012).
Gygi, S. P., Rochon, Y., Franza, B. R. & Aebersold, R. Correlation between protein and mRNA abundance in yeast. Mol. Cell. Biol. 19, 1720–1730 (1999).
Liu, Y., Beyer, A. & Aebersold, R. On the dependency of cellular protein levels on mRNA abundance. Cell 165, 535–550 (2016).
Specht, H. & Slavov, N. Transformative opportunities for single-cell proteomics. J. Proteome Res. 17, 2565–2571 (2018).
Cheung, R. K. & Utz, P. J. CyTOF—the next generation of cell detection. Nat. Rev. Rheumatol. 7, 502–503 (2011).
Slavov, N. Unpicking the proteome in single cells: single-cell mass spectrometry will help reveal mechanisms that underpin health and disease. Science 367, 512–513 (2020).
Petelski, A. A. et al. Multiplexed single-cell proteomics using SCoPE2. Nat. Protoc. 16, 5398–5425 (2021).
Dou, M. et al. High-throughput single cell proteomics enabled by multiplex isobaric labeling in a nanodroplet sample preparation platform. Anal. Chem. 91, 13119–13127 (2019).
Patrick, E. et al. Deconvolving the contributions of cell-type heterogeneity on cortical gene expression. PLoS Comput. Biol. 16, e1008120 (2020).
Perkel, J. M. Single-cell proteomics takes centre stage. Nature 597, 580–582 (2021).
Doerr, A. Single-cell proteomics. Nat. Methods 16, 20 (2018).
Vanderaa, C. & Gatto, L. Replication of single-cell proteomics data reveals important computational challenges. Expert Rev. Proteomics 18, 835–843 (2021).
Zhang, H., Cisse, M., Dauphin, Y. N. & Lopez-Paz, D. mixup: beyond empirical risk minimization. Preprint at https://doi.org/10.48550/ar**v.1710.09412 (2017).
Gray, G. K. et al. A human breast atlas integrating single-cell proteomics and transcriptomics. Dev. Cell 57, 1400–1420.e7 (2022).
Woo, J. et al. High-throughput and high-efficiency sample preparation for single-cell proteomics using a nested nanowell chip. Nat. Commun. 12, 7075 (2021).
Brunner, A. et al. Ultra-high sensitivity mass spectrometry quantifies single-cell proteome changes upon perturbation. Mol. Syst. Biol. 18, e10798 (2022).
Derks, J. et al. Increasing the throughput of sensitive proteomics by plexDIA. Nat. Biotechnol. 41, 50–59 (2023).
Leduc, A., Huffman, R. G., Cantlon, J., Khan, S. & Slavov, N. Exploring functional protein covariation across single cells using nPOP. Genome Biol. 23, 261 (2022).
Huffman, R. G. et al. Prioritized mass spectrometry increases the depth, sensitivity and data completeness of single-cell proteomics. Nat. Methods 20, 714–722 (2023).
Chu, T., Wang, Z., Pe’er, D. & Danko, C. G. Cell type and gene expression deconvolution with BayesPrism enables Bayesian integrative analysis across bulk and single-cell RNA sequencing in oncology. Nat. Cancer 3, 505–517 (2022).
Lopez, R. et al. DestVI identifies continuums of cell types in spatial transcriptomics data. Nat. Biotechnol. 40, 1360–1369 (2022).
Warrener, R. et al. Tumor cell-specific cytotoxicity by targeting cell cycle checkpoints. FASEB J. 17, 1550–1552 (2003).
Li, J. & Stanger, B. Z. Cell cycle regulation meets tumor immunosuppression. Trends Immunol. 41, 859–863 (2020).
Schwartz, G. K. & Shah, M. A. Targeting the cell cycle: a new approach to cancer therapy. J. Clin. Oncol. 23, 9408–9421 (2005).
Balch, C. M. et al. Final version of 2009 AJCC melanoma staging and classification. J. Clin. Oncol. 27, 6199–6206 (2009).
Betancourt, L. H. et al. The Human Melanoma Proteome Atlas—complementing the melanoma transcriptome. Clin. Transl. Med. 11, e451 (2021).
Beck, L. et al. Clinical proteomics of metastatic melanoma reveals profiles of organ specificity and treatment resistance. Clin. Cancer Res. 27, 2074–2086 (2021).
Mund, A. et al. Deep Visual Proteomics defines single-cell identity and heterogeneity. Nat. Biotechnol. 40, 1231–1240 (2022).
Crowson, A. N., Magro, C. M. & Mihm, M. C. Prognosticators of melanoma, the melanoma report and the sentinel lymph node. Modern Pathol. 19, S71–S87 (2006).
Prognosis and Survival for Melanoma Skin Cancer (Canadian Cancer Society, 2023); https://cancer.ca/en/cancer-information/cancer-types/skin-melanoma/prognosis-and-survival
Ciarletta, P., Foret, L. & Amar, M. B. The radial growth phase of malignant melanoma: multi-phase modelling, numerical simulations and linear stability analysis. J. R. Soc. Interface 8, 345–368 (2011).
Śmiech, M., Leszczyński, P., Kono, H., Wardell, C. & Taniguchi, H. Emerging BRAF mutations in cancer progression and their possible effects on transcriptional networks. Genes 11, 1342 (2020).
Lu, H. et al. Oncogenic BRAF-mediated melanoma cell invasion. Cell Rep. 15, 2012–2024 (2016).
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Sherman, B. T. et al. DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res. 50, W216–W221 (2022).
Wang, F. Deep domain adversarial neural network for the deconvolution of cell type mixtures in tissue proteome profiling. Zenodo https://doi.org/10.5281/zenodo.8278210 (2023).
Acknowledgements
We thank C. Li for giving suggestions for the project. F.W. and G.W. were supported by the National Natural Science Foundation of China (62225109 and 62072095). F.Y. was supported by the Key-Area Research and Development Program of Guangdong Province (2021B0101420005).
Author information
Authors and Affiliations
Contributions
F.Y. and J.Y. conceived and designed the project. F.W. developed the algorithm and conducted experiments under the supervision of F.Y. and J.Y. F.W. and F.Y. analysed the results. F.Y. and F.W. wrote the manuscript. F.W. finished the figures under the guidance of F.Y. and J.Y. J.Y., G.W., R.A. and R.B.G. revised the manuscript. L.H. gave suggestions for the domain adaptation and improvement of the manuscript. J.S. provided suggestions for the project. W.L. revised and polished the figures. All authors reviewed and approved the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Fabian Coscia and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Liesbeth Venema, in collaboration with the Nature Machine Intelligence team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Overall performance comparison of cell type mixture deconvolution.
a, Pearson’s correlation values of scpDeconv, Scaden, MuSiC, BayesPrism, and DestVI on the testing cases of the human breast atlas datasets (proteome to proteome, n = 6 cell types), murine cell line datasets (n = 3 cell types), and human cell line datasets (n = 4 cell types). Boxplots show the median (center lines), interquartile range (hinges), and 1.5 times the interquartile range (whiskers). b, Heatmap of the transcriptomic/proteomic profile from donor A and proteomic profile from donor B on 33 shared markers (x axis) and 6 cell types (y axis) studied in human breast atlas datasets. AV: alveolar cells, HS: hormone sensing cells, BA: basal cells, FI: fibroblasts, VA: vascular lymphatic cells, IM: immune cells. c, RMSE values of scpDeconv and the comparison methods by deconvolving pseudo-bulk proteomic data from donor B with single-cell transcriptomic data from donor A as reference (triangle) or with single-cell proteomic data from donor A as reference (circle) in the case of human breast atlas datasets. d, t-SNE plot showing the distribution of N2 dataset and nanoPOTS dataset. e, CCC values of scpDeconv in the case of murine cell line datasets (n = 3 cell types) with setting different sample numbers for the reference data generated in stage 1. Boxplots show the median (center lines), interquartile range (hinges), and 1.5 times the interquartile range (whiskers). f, CCC values of scpDeconv in the case of murine cell line datasets (n = 3 cell types) with setting different cell numbers per sample for the reference data generated in stage 1. Boxplots show the median (center lines), interquartile range (hinges), and 1.5 times the interquartile range (whiskers). g, RMSE values of scpDeconv in the case of murine cell line datasets (n = 3 cell types) with setting different sample numbers for the reference data generated in stage 1. Boxplots show the median (center lines), interquartile range (hinges), and 1.5 times the interquartile range (whiskers). h, RMSE values of scpDeconv in the case of murine cell lines datasets (n = 3 cell types) with setting different cell numbers per sample for the reference data generated in stage 1. Boxplots show the median (center lines), interquartile range (hinges), and 1.5 times the interquartile range (whiskers).
Extended Data Fig. 2 Systematical analysis of missing feature imputation in human cell line datasets.
a, Venn diagram showing the overlap of proteins from different sources in the reference group (left) or target group (middle), and the overlap between reference groups’ shared proteins and target groups’ shared proteins (right) in the human cell line datasets. b, Heatmap showing the normalized abundance of target-specific HVPs in the target group. c, Heatmap showing the raw expression of all proteins and missing features in the reference pseudo-bulk data after stage 1 of scpDeconv. Proteins and samples were ordered by hierarchical clustering on raw expression. d, Heatmap showing the imputed expression of all proteins and missing features in the reference pseudo-bulk data after stage 2 of scpDeconv. Proteins and samples kept the same order with above. e, Heatmap showing the raw expression of all proteins in the target data. Proteins kept the same order with above and samples were ordered by hierarchical clustering.
Extended Data Fig. 3 Performance of cell-cycle state inference by deconvolving monocytes pseudo samples with melanoma samples as reference.
a, Scatter plot of true (y axis) and predicted cell-cycle state proportions (x axis) of all stages, G1 stage, S stage, and G2 stage for monocytes pseudo samples (from left to right) by scpDeconv with melanoma samples as reference. b, Scatter plot of true (y axis) and predicted cell-cycle state proportions (x axis) of all stages, G1 stage, S stage, and G2 stage for monocytes pseudo samples (from left to right) by Scaden with melanoma samples as reference. c, Scatter plot of true (y axis) and predicted cell-cycle state proportions (x axis) of all stages, G1 stage, S stage, and G2 stage for monocytes pseudo samples (from left to right) by MuSiC with melanoma samples as reference. d, Scatter plot of true (y axis) and predicted cell-cycle state proportions (x axis) of all stages, G1 stage, S stage, and G2 stage for monocytes pseudo samples (from left to right) by BayesPrism with melanoma samples as reference. e, Scatter plot of true (y axis) and predicted cell-cycle state proportions (x axis) of all stages, G1 stage, S stage, and G2 stage for monocytes pseudo samples (from left to right) by DestVI with melanoma samples as reference.
Extended Data Fig. 4 Performance of cell-cycle state inference by deconvoluting melanoma pseudo samples with monocytes as reference.
a, Scatter plot of true (y axis) and predicted cell-cycle states proportions (x axis) of all stages, G1 stage, S stage, and G2 stage for melanoma pseudo samples (from left to right) by scpDeconv with monocytes samples as reference. b, Scatter plot of true (y axis) and predicted cell-cycle states proportions (x axis) of all stages, G1 stage, S stage, and G2 stage for melanoma pseudo samples (from left to right) by Scaden with monocytes samples as reference. c, Scatter plot of true (y axis) and predicted cell-cycle states proportions (x axis) of all stages, G1 stage, S stage, and G2 stage for melanoma pseudo samples (from left to right) by MuSiC with monocytes samples as reference. d, Scatter plot of true (y axis) and predicted cell-cycle states proportions (x axis) of all stages, G1 stage, S stage, and G2 stage for melanoma pseudo samples (from left to right) by BayesPrism with monocytes samples as reference. e, Scatter plot of true (y axis) and predicted cell-cycle states proportions (x axis) of all stages, G1 stage, S stage, and G2 stage for melanoma pseudo samples (from left to right) by DestVI with monocytes samples as reference.
Extended Data Fig. 5 Survival analysis of the predicted cell class proportions of clinical melanoma samples.
a, Three cases showing the scpDeconv’s pipeline on deconvolving clinical tissue proteome of metastasis melanoma patients. b, Kaplan–Meier plot of the predicted proportions (above and below median) of CD146-high melanoma for 174 melanoma patients. The P value was computed using a two-sided log-rank test. c, Kaplan–Meier plot of the predicted proportions (above and below median) of CD146-low melanoma for 174 melanoma patients. The P value was computed using a two-sided log-rank test. d, Kaplan–Meier plot of the predicted proportions (above and below median) of melanocytes for 174 melanoma patients. The P value was computed using a two-sided log-rank test. e, Kaplan–Meier plot of the predicted proportions (above and below median) of stroma for 174 melanoma patients. The P value was computed using a two-sided log-rank test.
Extended Data Fig. 6 Comprehensive survival analysis by combining BRAF mutation and the predicted cell class proportions of clinical melanoma samples.
a, Distribution of predicted cell class proportions of melanoma samples grouped by BRAF mutation (n = 90 melanoma samples for BRAF non-mutation group; n = 64 melanoma samples for BRAF mutation group). Boxplots show the median (center lines), interquartile range (hinges), and 1.5 times the interquartile range (whiskers). b, Forest plot showing the hazard ratios and 95% confidence intervals of BRAF mutation, vertical growth melanoma and radial growth melanoma in melanoma patients. Black squares represent the hazard ratios, and the horizontal bars extend from the lower limit to the upper limit of the 95% confidence interval of the estimate of the hazard ratio. The P value was computed using a two-sided log-rank test. c, Kaplan–Meier plot of the predicted proportions (above and below median) of vertical growth melanoma for melanoma samples without BRAF mutation. The P value was computed using a two-sided log-rank test. d, Kaplan–Meier plot of the predicted proportions (above and below median) of radial growth melanoma for melanoma samples without BRAF mutation. The P value was computed using a two-sided log-rank test. e, Kaplan–Meier plot of the predicted proportions (above and below median) of vertical growth melanoma for melanoma samples with BRAF mutation. The P value was computed using a two-sided log-rank test. f, Kaplan–Meier plot of the predicted proportions (above and below median) of radial growth melanoma for melanoma samples with BRAF mutation. The P value was computed using a two-sided log-rank test.
Supplementary information
Supplementary Table
Supplementary Table 1: Dataset description. Supplementary Table 2: Training parameters.
Source data
Source Data Fig. 2
Statistical source data.
Source Data Fig. 3
Statistical source data.
Source Data Fig. 4
Statistical source data.
Source Data Fig. 5
Statistical source data.
Source Data Extended Data Fig. 1
Statistical source data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, F., Yang, F., Huang, L. et al. Deep domain adversarial neural network for the deconvolution of cell type mixtures in tissue proteome profiling. Nat Mach Intell 5, 1236–1249 (2023). https://doi.org/10.1038/s42256-023-00737-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-023-00737-y
- Springer Nature Limited