Background

Colorectal cancer (CRC) is the third leading cause of cancer-related deaths worldwide. Despite recent advancements in therapeutic techniques, the 5-year overall survival (OS) for this malignancy is only approximately 50% [1]. Therefore, there is an urgent need to develop prognostic biomarkers for improving CRC treatment. Substantial research has demonstrated that CRC is a heterogeneous disease with distinct molecular features and clinical responses [2,3,4]. An accurate understanding of the biological properties of CRC heterogeneity is essential for precise treatment, prediction of clinical prognosis, and the development of molecular subtype-specific targeted drugs.

Intratumour heterogeneity (ITH) is a hallmark of cancer that drives tumour evolution and disease progression. Increased ITH has been linked to a higher chance of recurrence, regardless of cancer type or treatment [5]. Therefore, exploration of ITH is helpful for the development of accurate prognostic tools. Previous studies have shown that the ITH of CRC can be characterised by massive parallel sequencing data [6,7,8]. Recent studies on CRC subtypes have employed unsupervised clustering to classify whole-genome expression profiles derived from bulk tumours. This unsupervised method has been effectively applied to a number of malignancies [9] but is less effective for mixtures with unknown compositions and noise. The deconvolution approach is an alternative unsupervised method that can estimate the underlying subclones of genomics in complex tissues to better understand tumour heterogeneity and predict prognosis [10].

Numerous studies on gene signature biomarkers have been published because of the advent of sequencing technology. However, their clinical applications are relatively limited. Current gene expression profiling methods are expensive, time-consuming, invasive, and require tumour biopsies for tissue extraction. Therefore, it was unavailable for all the patients. In contrast, radiomic biomarkers do not incur any additional expenses, because medical imaging is a routine part of the clinical decision-making process. Unlike biopsies, medical imaging is noninvasive and can provide information about the entire tumour phenotype, including ITH. Multiple studies have reported an association between radiomic characteristics and underlying gene expression patterns.

Radiogenomics explores the association between radiomic features and genomic characteristics, with the aim of revealing relevant features that reflect the underlying biological functions most related to clinical phenotypes. Numerous studies have established the viability of radiogenomics for identifying intrinsic molecular subtypes and gene expression profiles in cancers such as ovarian cancer [5B, C). A robust predictive model was constructed by establishing a link between genomics and radiomics. In the process of applying the model, only image data is required in the absence of genomic data, which dramatically lowers the threshold for clinical application of the model. Currently, imaging examinations are routinely used for tumour diagnosis and therapy decisions. Utilising images as input data for prognostic prediction models does not significantly increase healthcare expenditure. Furthermore, imaging examinations are noninvasive and can be repeated at various times. CT-based radiogenomic signatures allow us to forecast patient prognosis and ITH prior to surgery.

Owing to the construction of a link between genomics and radiomics, the model is substantially more interpretable. Imaging characteristics have been related to CRC outcomes, such as treatment response, lymph node metastasis, local recurrence, and survival [28,29,30], but their biological underpinnings remain unclear. In the present study, we did not introduce relevant prior information but identified four CRC genomic subclones by analysing a large number of gene expression profiles using a fully unsupervised deconvolution strategy. According to our study, tumours with a low proportion of cell cycle subclones and a high proportion of extracellular matrix subclones were associated with a shorter survival rate. Among the signalling pathways within the cell cycle subclone, the G1/S transition and cell cycle checkpoint pathways likely reflect the DNA damage response and can be exploited for prognosis [31]. Cell cycle checkpoints can repair DNA and prevent further damage by detecting damaged DNA and temporarily halting the cell cycle progression. Cell cycle dysregulation can lead to abnormal cell proliferation and apoptosis, and is responsible for tumourigenesis. Defects in cell cycle checkpoints may be a cause of genomic instability in tumours [32]. Therefore, abnormalities in cell cycle pathways have prognostic significance in CRC. The ECM subclone is another subclone strongly associated with prognosis. It is reported that ECM remodelling is associated with CRC carcinogenesis and progression [33, 34]. As a major component of the tumour microenvironment, the ECM plays a crucial role in tumour progression and treatment response. Chakravarthy et al. built a signature that linked extracellular matrix genes to immune evasion and immunotherapy failure [35]. Eleven radiomic characteristics were chosen for our model, the majority of which were enriched in ECM- and immune-related pathways, which are well-known prognosis-related pathways. This suggests that the prognostic value of these radiomic signatures has a biological foundation. These morphological textures and spatial features are inseparable from the gene- and cell-level characteristics. Machine learning helps us better understand the biology behind these morphological textures and spatial features. Using this three-step methodology, we created a prognostic prediction model that provides an entry point for elucidating the underlying molecular mechanisms.

Our study had several limitations. First, this was a retrospective study, which led to inevitable disadvantages. In follow-up research, these findings should be validated by prospective studies to reduce the bias caused by uncontrollable factors in retrospective studies. Second, our genomic development dataset and corresponding testing dataset were obtained from public databases. However, cohorts 3 and 4 came from local medical centres and provided in-house data. CT scans from different machines at different centres better validate the robustness and clinical usability of the model. Third, the regions of interest are manually annotated, and this process is time-consuming and tedious. We are currently investigating more robust semi-automatic annotation methods [36] to address this issue.

Conclusions

In conclusion, we conducted an integrative analysis of genomics and radiomics to dissect ITH and build models for predicting the prognosis of patients with CRC. The unsupervised deconvolution method for genomic subclone identification provides a new perspective for exploring tumour heterogeneity. Radiogenomic signatures can be independent prognostic biomarkers and may serve as surrogates for genomic signatures. This integrative analysis of the radiogenomic strategy shows great promise for understanding ITH, and can be extended to other cancers to help patients who might benefit from precise clinical treatment.