Log in

Deep domain adversarial neural network for the deconvolution of cell type mixtures in tissue proteome profiling

  • Article
  • Published:

From Nature Machine Intelligence

View current issue Submit your manuscript

Abstract

Cell type deconvolution is a computational method for the determination/resolution of cell type proportions from bulk sequencing data, and is frequently used for the analysis of divergent cell types in tumour tissue samples. However, deconvolution technology is still in its infancy for the analysis of cell types using proteomic data due to challenges with repeatability/reproducibility, variable reference standards and the lack of single-cell proteomic reference data. Here we develop a deep-learning-based deconvolution method (scpDeconv) specifically designed for proteomic data. scpDeconv uses an autoencoder to leverage the information from bulk proteomic data to improve the quality of single-cell proteomic data, and employs a domain adversarial architecture to bridge the single-cell and bulk data distributions and transfer labels from single-cell data to bulk data. Extensive experiments validate the performance of scpDeconv in the deconvolution of proteomic data produced from various species/sources and different proteomic technologies. This method should find broad applicability to areas including tumour microenvironment interpretation and clinical diagnosis/classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1: The architecture of our proposed method.
Fig. 2: Overall performance comparison of cell-type mixture deconvolution.
Fig. 3: Deconvolution performance on integrated human cell line datasets with different HVP numbers.
Fig. 4: Inference of cell cycle across cell lines.
Fig. 5: Application on clinical tissue proteomic data for metastasis melanoma.

Similar content being viewed by others

Data availability

All data used in this study are publicly available, and their usage has been comprehensively discussed in the Methods. For the human breast atlas datasets22, the processed scRNA-seq dataset was downloaded from the GEO repository (accession no. GSE180878) and the processed CyTOF dataset from https://data.mendeley.com/datasets/vs8m5gkyfn/1. For the murine cell line datasets, the N2 dataset23 was downloaded from the MassIVE data repository (accession no. MSV000086809) and the nanoPOTS dataset16 from the MassIVE data repository (accession no. MSV000084110). For the human cell line datasets, the T-SCP dataset24 was downloaded from the PRIDE partner repository (accession no. PXD024043), the plexDIA dataset25 was downloaded from https://scp.slavovlab.net/Derks_et_al_2022, the pSCoPE_Leduc dataset26 was downloaded from https://scp.slavovlab.net/Leduc_et_al_2022, the SCoPE2_Leduc dataset26 from https://scp.slavovlab.net/Leduc_et_al_2021 and the pSCoPE_Huffman dataset27 from https://scp.slavovlab.net/Huffman_et_al_2022. The Deep Visual Proteomics (DVP) dataset36 was downloaded from the PRIDE partner repository (accession no. PXD023904). The bulk proteomic data and the clinical information regarding melanoma samples were obtained from the Supplementary Information provided in the original publication35. Source data are provided with this paper.

Code availability

The codes were implemented in Python and have been released on GitHub (https://github.com/TencentAILabHealthcare/scpDeconv) and Zenodo (https://doi.org/10.5281/zenodo.8278210)44 with detailed instructions. The comparison methods are implemented under their official guidance and the reference codes are listed in the Methods.

References

  1. Wang, X., Park, J., Susztak, K., Zhang, N. R. & Li, M. Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nat. Commun. 10, 380 (2019).

    Article  Google Scholar 

  2. Qian, J. et al. A pan-cancer blueprint of the heterogeneous tumor microenvironment revealed by single-cell profiling. Cell Res. 30, 745–762 (2020).

    Article  Google Scholar 

  3. Azizi, E. et al. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell 174, 1293–1308 (2018).

    Article  Google Scholar 

  4. Frishberg, A. et al. Cell composition analysis of bulk genomics using single-cell data. Nat. Methods 16, 327–332 (2019).

    Article  Google Scholar 

  5. Newman, A. M. et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 37, 773–782 (2019).

    Article  Google Scholar 

  6. Menden, K. et al. Deep learning-based cell composition analysis from tissue expression profiles. Sci. Adv. 6, eaba2619 (2020).

    Article  Google Scholar 

  7. Cable, D. M. et al. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat. Biotechnol. 40, 517–526 (2021).

    Article  Google Scholar 

  8. Edwards, N. J. et al. The CPTAC data portal: a resource for cancer proteomics research. J. Proteome Res. 14, 2707–2713 (2015).

    Article  Google Scholar 

  9. Marguerat, S. et al. Quantitative analysis of fission yeast transcriptomes and proteomes in proliferating and quiescent cells. Cell 151, 671–683 (2012).

    Article  Google Scholar 

  10. Gygi, S. P., Rochon, Y., Franza, B. R. & Aebersold, R. Correlation between protein and mRNA abundance in yeast. Mol. Cell. Biol. 19, 1720–1730 (1999).

    Article  Google Scholar 

  11. Liu, Y., Beyer, A. & Aebersold, R. On the dependency of cellular protein levels on mRNA abundance. Cell 165, 535–550 (2016).

    Article  Google Scholar 

  12. Specht, H. & Slavov, N. Transformative opportunities for single-cell proteomics. J. Proteome Res. 17, 2565–2571 (2018).

    Article  Google Scholar 

  13. Cheung, R. K. & Utz, P. J. CyTOF—the next generation of cell detection. Nat. Rev. Rheumatol. 7, 502–503 (2011).

    Article  Google Scholar 

  14. Slavov, N. Unpicking the proteome in single cells: single-cell mass spectrometry will help reveal mechanisms that underpin health and disease. Science 367, 512–513 (2020).

    Article  Google Scholar 

  15. Petelski, A. A. et al. Multiplexed single-cell proteomics using SCoPE2. Nat. Protoc. 16, 5398–5425 (2021).

    Article  Google Scholar 

  16. Dou, M. et al. High-throughput single cell proteomics enabled by multiplex isobaric labeling in a nanodroplet sample preparation platform. Anal. Chem. 91, 13119–13127 (2019).

    Article  Google Scholar 

  17. Patrick, E. et al. Deconvolving the contributions of cell-type heterogeneity on cortical gene expression. PLoS Comput. Biol. 16, e1008120 (2020).

    Article  Google Scholar 

  18. Perkel, J. M. Single-cell proteomics takes centre stage. Nature 597, 580–582 (2021).

    Article  Google Scholar 

  19. Doerr, A. Single-cell proteomics. Nat. Methods 16, 20 (2018).

    Article  Google Scholar 

  20. Vanderaa, C. & Gatto, L. Replication of single-cell proteomics data reveals important computational challenges. Expert Rev. Proteomics 18, 835–843 (2021).

    Article  Google Scholar 

  21. Zhang, H., Cisse, M., Dauphin, Y. N. & Lopez-Paz, D. mixup: beyond empirical risk minimization. Preprint at https://doi.org/10.48550/ar**v.1710.09412 (2017).

  22. Gray, G. K. et al. A human breast atlas integrating single-cell proteomics and transcriptomics. Dev. Cell 57, 1400–1420.e7 (2022).

    Article  Google Scholar 

  23. Woo, J. et al. High-throughput and high-efficiency sample preparation for single-cell proteomics using a nested nanowell chip. Nat. Commun. 12, 7075 (2021).

    Article  Google Scholar 

  24. Brunner, A. et al. Ultra-high sensitivity mass spectrometry quantifies single-cell proteome changes upon perturbation. Mol. Syst. Biol. 18, e10798 (2022).

    Article  Google Scholar 

  25. Derks, J. et al. Increasing the throughput of sensitive proteomics by plexDIA. Nat. Biotechnol. 41, 50–59 (2023).

    Article  Google Scholar 

  26. Leduc, A., Huffman, R. G., Cantlon, J., Khan, S. & Slavov, N. Exploring functional protein covariation across single cells using nPOP. Genome Biol. 23, 261 (2022).

    Article  Google Scholar 

  27. Huffman, R. G. et al. Prioritized mass spectrometry increases the depth, sensitivity and data completeness of single-cell proteomics. Nat. Methods 20, 714–722 (2023).

    Article  Google Scholar 

  28. Chu, T., Wang, Z., Pe’er, D. & Danko, C. G. Cell type and gene expression deconvolution with BayesPrism enables Bayesian integrative analysis across bulk and single-cell RNA sequencing in oncology. Nat. Cancer 3, 505–517 (2022).

    Article  Google Scholar 

  29. Lopez, R. et al. DestVI identifies continuums of cell types in spatial transcriptomics data. Nat. Biotechnol. 40, 1360–1369 (2022).

    Article  Google Scholar 

  30. Warrener, R. et al. Tumor cell-specific cytotoxicity by targeting cell cycle checkpoints. FASEB J. 17, 1550–1552 (2003).

    Article  Google Scholar 

  31. Li, J. & Stanger, B. Z. Cell cycle regulation meets tumor immunosuppression. Trends Immunol. 41, 859–863 (2020).

    Article  Google Scholar 

  32. Schwartz, G. K. & Shah, M. A. Targeting the cell cycle: a new approach to cancer therapy. J. Clin. Oncol. 23, 9408–9421 (2005).

    Article  Google Scholar 

  33. Balch, C. M. et al. Final version of 2009 AJCC melanoma staging and classification. J. Clin. Oncol. 27, 6199–6206 (2009).

    Article  Google Scholar 

  34. Betancourt, L. H. et al. The Human Melanoma Proteome Atlas—complementing the melanoma transcriptome. Clin. Transl. Med. 11, e451 (2021).

    Article  Google Scholar 

  35. Beck, L. et al. Clinical proteomics of metastatic melanoma reveals profiles of organ specificity and treatment resistance. Clin. Cancer Res. 27, 2074–2086 (2021).

    Article  Google Scholar 

  36. Mund, A. et al. Deep Visual Proteomics defines single-cell identity and heterogeneity. Nat. Biotechnol. 40, 1231–1240 (2022).

    Article  Google Scholar 

  37. Crowson, A. N., Magro, C. M. & Mihm, M. C. Prognosticators of melanoma, the melanoma report and the sentinel lymph node. Modern Pathol. 19, S71–S87 (2006).

    Article  Google Scholar 

  38. Prognosis and Survival for Melanoma Skin Cancer (Canadian Cancer Society, 2023); https://cancer.ca/en/cancer-information/cancer-types/skin-melanoma/prognosis-and-survival

  39. Ciarletta, P., Foret, L. & Amar, M. B. The radial growth phase of malignant melanoma: multi-phase modelling, numerical simulations and linear stability analysis. J. R. Soc. Interface 8, 345–368 (2011).

    Article  Google Scholar 

  40. Śmiech, M., Leszczyński, P., Kono, H., Wardell, C. & Taniguchi, H. Emerging BRAF mutations in cancer progression and their possible effects on transcriptional networks. Genes 11, 1342 (2020).

    Article  Google Scholar 

  41. Lu, H. et al. Oncogenic BRAF-mediated melanoma cell invasion. Cell Rep. 15, 2012–2024 (2016).

    Article  Google Scholar 

  42. Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).

    Article  Google Scholar 

  43. Sherman, B. T. et al. DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res. 50, W216–W221 (2022).

    Article  Google Scholar 

  44. Wang, F. Deep domain adversarial neural network for the deconvolution of cell type mixtures in tissue proteome profiling. Zenodo https://doi.org/10.5281/zenodo.8278210 (2023).

Download references

Acknowledgements

We thank C. Li for giving suggestions for the project. F.W. and G.W. were supported by the National Natural Science Foundation of China (62225109 and 62072095). F.Y. was supported by the Key-Area Research and Development Program of Guangdong Province (2021B0101420005).

Author information

Authors and Affiliations

Authors

Contributions

F.Y. and J.Y. conceived and designed the project. F.W. developed the algorithm and conducted experiments under the supervision of F.Y. and J.Y. F.W. and F.Y. analysed the results. F.Y. and F.W. wrote the manuscript. F.W. finished the figures under the guidance of F.Y. and J.Y. J.Y., G.W., R.A. and R.B.G. revised the manuscript. L.H. gave suggestions for the domain adaptation and improvement of the manuscript. J.S. provided suggestions for the project. W.L. revised and polished the figures. All authors reviewed and approved the manuscript.

Corresponding authors

Correspondence to Ruedi Aebersold, Guohua Wang or Jianhua Yao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Fabian Coscia and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Liesbeth Venema, in collaboration with the Nature Machine Intelligence team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Overall performance comparison of cell type mixture deconvolution.

a, Pearson’s correlation values of scpDeconv, Scaden, MuSiC, BayesPrism, and DestVI on the testing cases of the human breast atlas datasets (proteome to proteome, n = 6 cell types), murine cell line datasets (n = 3 cell types), and human cell line datasets (n = 4 cell types). Boxplots show the median (center lines), interquartile range (hinges), and 1.5 times the interquartile range (whiskers). b, Heatmap of the transcriptomic/proteomic profile from donor A and proteomic profile from donor B on 33 shared markers (x axis) and 6 cell types (y axis) studied in human breast atlas datasets. AV: alveolar cells, HS: hormone sensing cells, BA: basal cells, FI: fibroblasts, VA: vascular lymphatic cells, IM: immune cells. c, RMSE values of scpDeconv and the comparison methods by deconvolving pseudo-bulk proteomic data from donor B with single-cell transcriptomic data from donor A as reference (triangle) or with single-cell proteomic data from donor A as reference (circle) in the case of human breast atlas datasets. d, t-SNE plot showing the distribution of N2 dataset and nanoPOTS dataset. e, CCC values of scpDeconv in the case of murine cell line datasets (n = 3 cell types) with setting different sample numbers for the reference data generated in stage 1. Boxplots show the median (center lines), interquartile range (hinges), and 1.5 times the interquartile range (whiskers). f, CCC values of scpDeconv in the case of murine cell line datasets (n = 3 cell types) with setting different cell numbers per sample for the reference data generated in stage 1. Boxplots show the median (center lines), interquartile range (hinges), and 1.5 times the interquartile range (whiskers). g, RMSE values of scpDeconv in the case of murine cell line datasets (n = 3 cell types) with setting different sample numbers for the reference data generated in stage 1. Boxplots show the median (center lines), interquartile range (hinges), and 1.5 times the interquartile range (whiskers). h, RMSE values of scpDeconv in the case of murine cell lines datasets (n = 3 cell types) with setting different cell numbers per sample for the reference data generated in stage 1. Boxplots show the median (center lines), interquartile range (hinges), and 1.5 times the interquartile range (whiskers).

Source data

Extended Data Fig. 2 Systematical analysis of missing feature imputation in human cell line datasets.

a, Venn diagram showing the overlap of proteins from different sources in the reference group (left) or target group (middle), and the overlap between reference groups’ shared proteins and target groups’ shared proteins (right) in the human cell line datasets. b, Heatmap showing the normalized abundance of target-specific HVPs in the target group. c, Heatmap showing the raw expression of all proteins and missing features in the reference pseudo-bulk data after stage 1 of scpDeconv. Proteins and samples were ordered by hierarchical clustering on raw expression. d, Heatmap showing the imputed expression of all proteins and missing features in the reference pseudo-bulk data after stage 2 of scpDeconv. Proteins and samples kept the same order with above. e, Heatmap showing the raw expression of all proteins in the target data. Proteins kept the same order with above and samples were ordered by hierarchical clustering.

Extended Data Fig. 3 Performance of cell-cycle state inference by deconvolving monocytes pseudo samples with melanoma samples as reference.

a, Scatter plot of true (y axis) and predicted cell-cycle state proportions (x axis) of all stages, G1 stage, S stage, and G2 stage for monocytes pseudo samples (from left to right) by scpDeconv with melanoma samples as reference. b, Scatter plot of true (y axis) and predicted cell-cycle state proportions (x axis) of all stages, G1 stage, S stage, and G2 stage for monocytes pseudo samples (from left to right) by Scaden with melanoma samples as reference. c, Scatter plot of true (y axis) and predicted cell-cycle state proportions (x axis) of all stages, G1 stage, S stage, and G2 stage for monocytes pseudo samples (from left to right) by MuSiC with melanoma samples as reference. d, Scatter plot of true (y axis) and predicted cell-cycle state proportions (x axis) of all stages, G1 stage, S stage, and G2 stage for monocytes pseudo samples (from left to right) by BayesPrism with melanoma samples as reference. e, Scatter plot of true (y axis) and predicted cell-cycle state proportions (x axis) of all stages, G1 stage, S stage, and G2 stage for monocytes pseudo samples (from left to right) by DestVI with melanoma samples as reference.

Extended Data Fig. 4 Performance of cell-cycle state inference by deconvoluting melanoma pseudo samples with monocytes as reference.

a, Scatter plot of true (y axis) and predicted cell-cycle states proportions (x axis) of all stages, G1 stage, S stage, and G2 stage for melanoma pseudo samples (from left to right) by scpDeconv with monocytes samples as reference. b, Scatter plot of true (y axis) and predicted cell-cycle states proportions (x axis) of all stages, G1 stage, S stage, and G2 stage for melanoma pseudo samples (from left to right) by Scaden with monocytes samples as reference. c, Scatter plot of true (y axis) and predicted cell-cycle states proportions (x axis) of all stages, G1 stage, S stage, and G2 stage for melanoma pseudo samples (from left to right) by MuSiC with monocytes samples as reference. d, Scatter plot of true (y axis) and predicted cell-cycle states proportions (x axis) of all stages, G1 stage, S stage, and G2 stage for melanoma pseudo samples (from left to right) by BayesPrism with monocytes samples as reference. e, Scatter plot of true (y axis) and predicted cell-cycle states proportions (x axis) of all stages, G1 stage, S stage, and G2 stage for melanoma pseudo samples (from left to right) by DestVI with monocytes samples as reference.

Extended Data Fig. 5 Survival analysis of the predicted cell class proportions of clinical melanoma samples.

a, Three cases showing the scpDeconv’s pipeline on deconvolving clinical tissue proteome of metastasis melanoma patients. b, Kaplan–Meier plot of the predicted proportions (above and below median) of CD146-high melanoma for 174 melanoma patients. The P value was computed using a two-sided log-rank test. c, Kaplan–Meier plot of the predicted proportions (above and below median) of CD146-low melanoma for 174 melanoma patients. The P value was computed using a two-sided log-rank test. d, Kaplan–Meier plot of the predicted proportions (above and below median) of melanocytes for 174 melanoma patients. The P value was computed using a two-sided log-rank test. e, Kaplan–Meier plot of the predicted proportions (above and below median) of stroma for 174 melanoma patients. The P value was computed using a two-sided log-rank test.

Extended Data Fig. 6 Comprehensive survival analysis by combining BRAF mutation and the predicted cell class proportions of clinical melanoma samples.

a, Distribution of predicted cell class proportions of melanoma samples grouped by BRAF mutation (n = 90 melanoma samples for BRAF non-mutation group; n = 64 melanoma samples for BRAF mutation group). Boxplots show the median (center lines), interquartile range (hinges), and 1.5 times the interquartile range (whiskers). b, Forest plot showing the hazard ratios and 95% confidence intervals of BRAF mutation, vertical growth melanoma and radial growth melanoma in melanoma patients. Black squares represent the hazard ratios, and the horizontal bars extend from the lower limit to the upper limit of the 95% confidence interval of the estimate of the hazard ratio. The P value was computed using a two-sided log-rank test. c, Kaplan–Meier plot of the predicted proportions (above and below median) of vertical growth melanoma for melanoma samples without BRAF mutation. The P value was computed using a two-sided log-rank test. d, Kaplan–Meier plot of the predicted proportions (above and below median) of radial growth melanoma for melanoma samples without BRAF mutation. The P value was computed using a two-sided log-rank test. e, Kaplan–Meier plot of the predicted proportions (above and below median) of vertical growth melanoma for melanoma samples with BRAF mutation. The P value was computed using a two-sided log-rank test. f, Kaplan–Meier plot of the predicted proportions (above and below median) of radial growth melanoma for melanoma samples with BRAF mutation. The P value was computed using a two-sided log-rank test.

Supplementary information

Reporting Summary

Supplementary Table

Supplementary Table 1: Dataset description. Supplementary Table 2: Training parameters.

Source data

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, F., Yang, F., Huang, L. et al. Deep domain adversarial neural network for the deconvolution of cell type mixtures in tissue proteome profiling. Nat Mach Intell 5, 1236–1249 (2023). https://doi.org/10.1038/s42256-023-00737-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-023-00737-y

  • Springer Nature Limited

Navigation