Abstract
This research explores the potential of multimodal fusion for the differential diagnosis of early-stage lung adenocarcinoma (LUAD) (tumor sizes < 2 cm). It combines liquid biopsy biomarkers, specifically extracellular vesicle long RNA (evlRNA) and the computed tomography (CT) attributes. The fusion model achieves an impressive area under receiver operating characteristic curve (AUC) of 91.9% for the four-classification of adenocarcinoma, along with a benign-malignant AUC of 94.8% (sensitivity: 89.1%, specificity: 94.3%). These outcomes outperform the diagnostic capabilities of the single-modal models and human experts. A comprehensive SHapley Additive exPlanations (SHAP) is provided to offer deep insights into model predictions. Our findings reveal the complementary interplay between evlRNA and image-based characteristics, underscoring the significance of integrating diverse modalities in diagnosing early-stage LUAD.
Similar content being viewed by others
Lung cancer stands as the leading cause of cancer-related deaths worldwide1. Early detection through low-dose CT (LDCT) has shown significant potential in reducing mortality rates, as demonstrated by notable trials such as the National Lung Screening Trial (NLST)2 and the Dutch-Belgian Lung Cancer Screening Trial (NELSON)3. One significant challenge in LDCT screening is the high rate of false-positive results, leading to unnecessary biopsy or surgical procedures. For instance, the NLST reported a false-positive rate of 26.3% for baseline screening2, while the NELSON trial reported a rate of 19.8%3. Lung nodules identified during LDCT screening, often smaller than 2 cm4,5, are challenging to biopsy effectively6,7. Therefore, the primary approach involves close monitoring, but larger tumors may exhibit resistance or metastasize during this period.
In addition to LDCT screening, liquid biopsies can identify various biomolecular features, providing potential insights into disease status8. Combining liquid biopsy with AI methods holds significant promise for early-stage diagnosis9,10. Extracellular vesicle long RNA (evlRNA), identified as a candidate biomarker, is enriched in the blood of lung cancer patients compared to healthy controls, showing significant diagnostic value in early-stage LUAD patients11,12. However, many current liquid biopsies focusing on early cancer detection lack the sensitivity needed for reliable identification of early-stage cancers13.
Artificial intelligence (AI) and biomarkers, both non-invasive, hold substantial promise in sha** the future of lung cancer screening14. The combined assessment of CT and evlRNA features in lung cancer cases has not been thoroughly investigated. Our study aims to explore the complementarity of these two modalities, leveraging their respective strengths and addressing individual weaknesses for the early-stage diagnosis of LUAD with tumors smaller than 2 cm.
This study enrolled 146 participants (Table S1) who underwent lung surgeries due to the presence of pulmonary nodules. These individuals had available preoperative blood samples and chest CT scans. Among them, 111 patients were diagnosed with LUAD, while 35 were categorized as benign. The LUAD group is subdivided into three pathological categories: adenocarcinoma in situ (AIS; N = 36), minimally invasive adenocarcinoma (MIA; N = 34), and invasive adenocarcinoma (IA; N = 41).
Model development details are illustrated in Fig. 1A. We extracted imaging features, referred to as Rad features, from a pre-trained multitask 3D DenseSharp neural network15. These features included malignancy probability, IA probability, invasiveness category, attenuation category, 2D diameter, and volumetric consolidation tumor ratio (vCTR). In addition, blood samples were collected in 10 mL K2EDTA anticoagulant vacutainer tubes. Subsequent steps for serum extracellular vesicle (EV) purification, RNA isolation and RNA-seq analysis followed procedures from our prior study12. We selected 17 evlRNA features from differentially expressed genes (DEGs) between the LUAD and control groups. Moreover, to evaluate our methods compared to human performance and investigate the potential enhancement of diagnostics through the integration of human expertise, we conducted an observer study involving both a senior and a junior investigator.
For multimodal fusion, incorporating Rad features extracted by AI from CT, evlRNA features from liquid biopsy, and observation features from clinicians, we employed the XGBoost machine learning framework16. Separate XGBoost models were established for each feature fusion scenario, with a primary training objective of multi-class classification (IA, MIA, AIS, Benign). The flexibility to use different combinations allows for diverse subgroup analyses. A 5-fold cross-validation approach was adopted, and average results are reported.
The performance evaluation of the multimodal fusion is shown in Fig. 1B–D, revealing several intriguing discoveries:
-
(1)
Combining evlRNA and Rad features results in a highly effective diagnostic method, with an impressive AUC of 0.919 (Fig. 1B). This combined model surpasses unimodal models and is comparable to the performance of senior expert.
-
(2)
Integrating human expertise with the combination of evlRNA and Rad characteristics leads to improved results, with AUC values of 0.934 and 0.924 for the inclusion of senior and junior experts, respectively (Fig. 1C).
-
(3)
Furthermore, evlRNA-based and image-based features complement each other, displaying a mutually reinforcing relationship (Fig. 1D). The three subplots illustrate that combining evlRNA with image-based attributes (Rad, (v)CTR, observer) leads to better performance than using a single modality.
The evlRNA + Rad model outperforms other multi-modal fusion models without human expert intervention. In the subsequent text, we’ll use it as our standard fusion model. We conducted a detailed assessment of our model’s performance, concentrating on three vital clinical subtasks (see Table 1):
-
(a)
The binary classification distinguishing between malignant nodules (IA, MIA, or AIS) and benign nodules. The fusion model attained an impressive area under receiver operating characteristic curve (AUC) of 94.8%, with sensitivity of 89.1% and specificity of 94.3%.
-
(b)
The binary classification distinguishing between invasive nodules (IA or MIA) and preinvasive nodules (AIS or Benign). The goal is to reduce overdiagnosis in line with the 2021 WHO guidelines17. Invasive nodules require surgical intervention due to their worse prognosis, while preinvasive nodules usually need CT monitoring. The fusion model performed well in this task with an AUC of 87.2%, a sensitivity of 80.0%, and a specificity of 87.1%.
-
(c)
The binary classification distinguishing IA nodules from MIA in invasive nodules. MIA patients have a high disease-free survival rate, nearly 100%, with careful limited resection18. In contrast, IA patients have a lower disease-free survival rate, around 60% to 70%19,20. The fusion model achieved excellent results for this task with an AUC of 92.1%, a sensitivity of 92.8%, and a specificity of 88.6%.
Our fusion model consistently outperforms single-modal models across different subtasks, just as it did in the four-class classification. Notably, our fusion method significantly improves specificity, effectively reducing false positives and overdiagnosis. The fusion model exceeds the specificity of senior experts by 14.3%, 9.7%, and 11.9% in subtasks (a), (b), and (c), respectively. Furthermore, when combining evlRNA, Rad, and senior expert inputs, our model achieves 100% specificity in distinguishing malignant from benign nodules during cross-validation.
To enhance the understanding of feature importance in predictive modeling, we employed the SHapley Additive exPlanations (SHAP) post hoc explanatory framework21. We applied this framework to three models: evlRNA, Rad, and evlRNA + Rad. The feature impacts for the 4-category classification are depicted in Fig. 1E. Notably, in the fusion model, vCTR is the most crucial feature. Furthermore, the SHAP framework extended to individualized validation predictions (Fig. 1F). The visual illustration unveils that the patient with a high probability of 0.94 for being classified as IA. This probability primarily results from factors such as an IA probability of 0.9312, a vCTR value of 0.2426, a gene CCND value of 15.89, and other risk-contributing factors. Understanding individual predictions is valuable for clinical decision-making. In addition, we explored how feature values relate to predicted categories (Fig. S1 in Supplementary). In the evlRNA analysis, certain genes exhibit distinct correlations with category predictions, which become more evident in the Rad feature analysis. We believe Rad features, being AI-generated, naturally possess discriminative abilities. In the joint analysis of Rad and evlRNA features, the top five crucial features combine genetic and imaging traits, highlighting their synergistic effects (details in Supplementary Results).
Assessing a model’s robustness is crucial for both evaluation and practical use. We evaluated the robustness of our XGBoost model by adding Gaussian noise to input features (Fig. S2). With low noise, the model’s performance slightly declines, but as noise increases, the degradation intensifies. Remarkably, in a specific noise range, our multimodal fusion model, consistently outperforms single-modal models, showcasing its robustness.
Our study has a few limitations. Firstly, we only include 146 participants due to difficulties in obtaining both evlRNA detection data and CT imaging samples. Collecting evlRNA information is time-consuming and expensive. In the future, a larger dataset is needed to avoid overfitting and improve validation accuracy. Secondly, our study only involved internal validation and did not include external validation, thereby leaving the model’s applicability and generalizability unexplored.
In summary, our study has underscored the complementary nature between evlRNA-based and image-based features, with human analysis integration leading to improved performance. These results emphasize the critical importance of multimodal fusion to enhance differential diagnosis of early-stage lung adenocarcinoma in the LDCT screening.
Methods
Data characteristics
The study involved 146 participants who underwent lung operations due to the presence of pulmonary nodules between 2018 and 2020. This group included 111 patients diagnosed with lung adenocarcinoma (LUAD) and 35 controls classified as benign cases. Essential participant characteristics are provided in Table S1. The following inclusion criteria for the LUAD patients were applied: (a) patients pathologically proven to have LUAD (tumor size < 2 cm), (b) obtainable preoperative blood samples, (c) obtainable chest CT scan, and (d) patients gave their informed consent before enrollment.
This study was approved by the ethics committee of Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, and complied with all relevant ethical regulations including the Declaration of Helsinki. All participants were from a registered lung cancer screening study (China Lung Cancer Screening Study, NCT03975504), and signed informed consent to take part in the research. Our study did not specifically address cases involving multiple nodules. In our cohort, only two individuals had multiple pulmonary nodules. For these cases, we chose to analyze only the most severe nodule.
In this study, the CT images of the latest CT examination before surgery were collected from a single clinical center (Chest Hospital affiliated to Shanghai Jiaotong University School of Medicine). Thicknesses of these scans range between 0.625 mm and 1.5 mm. The pathological label and mass center of each lesion is manually labeled by a junior thoracic radiologist, according to corresponding pathological reports. These annotations are then confirmed by a senior radiologist with 15 years of experience in chest CT. Patient identities are anonymized for privacy protection.
Pre-trained DenseSharp model
To extract nodule features from CT images, we utilized a pre-trained 3D DenseSharp neural network15, which had undergone extensive training on two internal datasets: Pretraining cohort A contained 651 subcentimeter nodules15, and pretraining cohort B comprised 4728 nodules from the Pulmonary-RadPath dataset22. Number of nodules for pretraining can be found in Table S2. The DenseSharp model generates outputs through five heads: four for classification and one for creating a 3D nodule segmentation. The four classification tasks include invasiveness with four categories (Benign, AIS, MIA, IA), malignancy (benign/malignant), IA (non-IA/IA), and attenuation with three categories (solid, part-solid, ground-glass).
We conducted standard data preprocessing adhering to common practices: (1) Resampling CT volumes to dimensions of 1 mm × 1 mm × 1 mm. (2) Normalizing Hounsfield Units to the range [−1, 1]. (3) Crop** a 32 × 32 × 32 volume centered at the centroid of each lesion. In our proposed model, the input consists of a cubic CT volume patch measuring 32 mm × 32 mm × 32 mm.
The training employs early stop** based on validation loss—training stops if the validation loss does not decrease within 10 epochs. We incorporate online data augmentations, such as random rotation, flip**, and translation, in every volume. We use Adam optimizer23 to train all models end-to-end for 200 epochs. Our experiments are conducted using PyTorch 1.1124 on 2 Nvidia RTX 3090 GPUs.
Extracting imaging features
We employed the pre-trained 3D DenseSharp neural network to perform the classification task and generate the nodule mask. We collected prediction logits from the classification task, resulting in four nodule attributes. Since the size of the solid component within SSNs observed on CT images is closely related to the extent of tumor infiltration25,26, we developed an internal tool to calculate 2D diameter (mm), the consolidation tumor ratio (CTR), and volumetric CTR (vCTR). Notably, a nuanced differentiation exists between the two, as CTR measures the diameter fraction of the solid components in nodules, whereas vCTR quantifies the volumetric proportion. By combining these features with the previously established fundamental attributes, we derived a set of six nodule imaging features known as Rad features. These Rad features encompass malignancy probability, IA probability, invasiveness category, attenuation category, clinically measured 2D diameter (mm), and v(CTR).
Extracting evlRNA features
The methodologies employed for collection of blood samples, serum extracellular vesicle (EV) purification, RNA isolation, characterization of EVs, construction of evlRNA libraries and subsequent RNA-seq analysis closely adhere to those detailed in the previously cited work12. We explored the differentially expressed genes (DEGs) between the LUAD and control groups, which revealed a total of 145 upregulated and 363 downregulated DEGs (p value < 0.05, fold change > 1.5). Feature selection was performed by the Boruta algorithm27 to find all relevant variables for machine learning. A signature of 17 DEGs were selected as diagnostically informative EV-associated evlRNAs: HLA-E, BIN2, Z97192.1, KAZN, CCDC9B, PLEKHO1, PTGS1, ANXA4, SNX29, CEP164, GFRA2, TBC1D24, NPC2, CCND1, KIAA1217, DMD, and SEZ6L.
Integrating multimodality features
We employed the machine learning framework XGBoost16 to perform multimodal fusion of various features. To integrate human expertise, we gathered pathological four-type judgments from both doctors for all samples. These judgments were then used as features, combined with other modal features, and introduced as fusion features into the XGBoost model for training and validation. In our experiments, we established separate XGBoost models for each feature fusion scenario.
The primary training objective of our model involves multi-class classification, specifically distinguishing between (IA, MIA, AIS, Benign) categories. This is achieved using the multiclass softmax as the objective function, which generates a probability distribution for each class. When it comes to prediction results, we have the flexibility to use different combinations based on specific needs, allowing for various subgroup analyses. As an illustration, we outline the calculation of positive and negative probabilities for each task as follows: in task (a), ypostive = yIA + yMIA + yAIS, ynegative = yBenign; in task (b), ypostive = yIA + yMIA, ynegative = yAIS + yBenign; in task (c), ypostive = yIA, ynegative = yMIA.
We adopted a 5-fold cross-validation approach, wherein the entire dataset was evenly divided into five distinct subsets. During each iteration, four subsets were used for training, leaving one subset for validation. The stop** criteria involve early stop** based on the maximum number of iterations, with the default value of num_boost_round set to 10. The reported performance metrics represent the average results obtained across the fivefold validation. To assess the effectiveness of a diagnostic test in distinguishing between positive and negative cases, we employed the Youden index. This threshold is employed to strike a balance between sensitivity and specificity. Our multi-modal fusion model’s parameters are shown in Table S3.
Observation study
To compare our methodologies with human proficiency, an experienced senior radiologist (with over a decade of expertise in chest CT interpretation) and a junior radiologist (with 3 years of experience in chest CT interpretation) from Chest Hospital affiliated to Shanghai Jiaotong University School of Medicine were consulted. These professionals, who were kept unaware of the histopathological findings and clinical information, independently undertook the task of classifying and diagnosing all the nodules. The outcomes of their expert-based image interpretations were referred to as observation features.
Model robustness analysis
During the experiments, we introduced Gaussian noise with a mean of 0 and observed its impact on the model’s performance (4-categoray classification AUC) as perturbations increased, corresponding to an increase in the standard deviation. We selected three models for analysis: Rad, evlRNA, and a multi-modal fusion model, evlRNA + Rad. In our study, we injected Gaussian noise into the input features of the XGBoost model with a mean of 0 and a standard deviation of σ. We then observed the trend of 5-fold AUC validation performance on the dataset as the standard deviation varied. For each standard deviation value selected, we randomly generated Gaussian noise 100 times and calculated the average AUC results for these 100 runs. In Fig. S3, we represented the average results of the 100 runs with a solid line, and we used shading to indicate the standard deviation interval for 100 noise injections.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability
The sequencing data have been deposited in the National Center for Biotechnology Information Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/) database under accession number GSE200288. Additional data utilized and/or analyzed during the current study are available from the corresponding author upon reasonable request.
Code availability
Our pipeline is available at https://github.com/yinghyu5214/fusion.
References
Siegel, R. L., Miller, K. D., Wagle, N. S. & Jemal, A. Cancer statistics, 2023. CA Cancer J. Clin. 73, 17–48 (2023).
Aberle, D. R. et al. Reduced lung-cancer mortality with low-dose computed tomographic screening. N. Engl. J. Med. 365, 395–409 (2011).
de Koning, H. J. et al. Reduced lung-cancer mortality with volume CT screening in a randomized trial. N. Engl. J. Med. 382, 503–513 (2020).
Team, N.L.S.T.R. Results of initial low-dose computed tomographic screening for lung cancer. N. Engl. J. Med. 368, 1980–1991 (2013).
Yang, W. et al. Community-based lung cancer screening with low-dose CT in China: results of the baseline screening. Lung Cancer 117, 20–26 (2018).
Andrade, J. R. et al. CT-guided percutaneous core needle biopsy of pulmonary nodules smaller than 2 cm: technical aspects and factors influencing accuracy. J. Bras. Pneumol. 44, 307–314 (2018).
Gould, M. K. et al. Evaluation of individuals with pulmonary nodules: when is it lung cancer? Diagnosis and management of lung cancer, 3rd ed: American College of Chest Physicians evidence-based clinical practice guidelines. Chest 143, e93S–e120S (2013).
Connal, S. et al. Liquid biopsies: the future of cancer early detection. J. Transl. Med. 21, 118 (2023).
Shin, H. et al. Early-stage lung cancer diagnosis by deep learning-based spectroscopic analysis of circulating exosomes. ACS Nano 14, 5435–5444 (2020).
Kothen-Hill, S. T. et al. Deep learning mutation prediction enables early stage lung cancer detection in liquid biopsy. in International Conference on Learning Representations (2018).
Li, Y. et al. Extracellular vesicles long RNA sequencing reveals abundant mRNA, circRNA, and lncRNA in human blood as potential biomarkers for cancer diagnosis. Clin. Chem. 65, 798–808 (2019).
Zhang, Y. et al. Extracellular vesicle long RNA markers of early‐stage lung adenocarcinoma. Int. J. Cancer 152, 1490–1500 (2023).
Klein, E. A. et al. Clinical validation of a targeted methylation-based multi-cancer early detection test using an independent validation set. Ann. Oncol. 32, 1167–1177 (2021).
Adams, S. J. et al. Lung cancer screening. Lancet 401, 390–408 (2023).
Zhao, W. et al. 3D deep learning from CT scans predicts tumor invasiveness of subcentimeter pulmonary adenocarcinomas. Cancer Res. 78, 6881–6889 (2018).
Chen, T. & Guestrin, C. Xgboost: a scalable tree boosting system. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (2016).
Nicholson, A. G. et al. The 2021 WHO classification of lung tumors: impact of advances since 2015. J. Thorac. Oncol. 17, 362–387 (2022).
Travis, W. D. et al. International Association for the Study of Lung Cancer/American Thoracic Society/European Respiratory Society international multidisciplinary classification of lung adenocarcinoma. J. Thorac. Oncol. 6, 244–285 (2011).
Vazquez, M. et al. Solitary and multiple resected adenocarcinomas after CT screening for lung cancer: histopathologic features and their prognostic implications. Lung Cancer 64, 148–154 (2009).
Borczuk, A. C. et al. Invasive size is an independent predictor of survival in pulmonary adenocarcinoma. Am. Journal of Surg. Pathol. 33, 462 (2009).
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. in Advances in Neural Information Processing Systems Vol. 30 (2017).
Yang, J. et al. Hierarchical classification of pulmonary lesions: a large-scale radio-pathomics study. in Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part VI 23 497–507 (Springer, 2020).
Kinga, D. & Adam, J. B. A method for stochastic optimization. in International Conference on Learning Representations (ICLR), Vol. 5 6 (San Diego, 2015).
Paszke, A. et al. Automatic Differentiation in Pytorch (2017).
Lee, K. H. et al. Correlation between the size of the solid component on thin-section CT and the invasive component on pathology in small lung adenocarcinomas manifesting as ground-glass nodules. J. Thorac. Oncol. 9, 74–82 (2014).
Sun, J. et al. Deep learning-based solid component measuring enabled interpretable prediction of tumor invasiveness for lung adenocarcinoma. Lung Cancer 186, 107392 (2023).
Kursa, M. B. & Rudnicki, W. R. Feature selection with the Boruta package. J. Stat. Softw. 36, 1–13 (2010).
Acknowledgements
This work was supported by the following sources: National Multi-disciplinary Treatment Project for Major Diseases (2020NMDTP to Baohui Han), Shanghai Science and Technology Committee Program (19QA1408000 to Yanwei Zhang), the foundation of Shanghai Chest Hospital (2021YNZYB01 to Yanwei Zhang), National Science Foundation of China (82002941 to Beibei Sun).
Author information
Authors and Affiliations
Contributions
Yanwei Zhang: Conceptualization, data curation, funding acquisition, and writing—original draft; Beibei Sun: Investigation and resources; Yinghong Yu and Jun Lu: Methodology, formal analysis and visualization; Yuqing Lou: Resources and data curation; Fangfei Qian and Tianxiang Chen: Investigation and resources; Li Zhang: Software; Jiancheng Yang: Software and supervision; Hua Zhong and Ligang Wu: Supervision and project administration; Baohui Han: Conceptualization, validation, writing—review & editing, supervision and funding acquisition. Yanwei Zhang, Beibei Sun, Yinghong Yu and Jun Lu are co-first authors and they contribute equally to this work.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zhang, Y., Sun, B., Yu, Y. et al. Multimodal fusion of liquid biopsy and CT enhances differential diagnosis of early-stage lung adenocarcinoma. npj Precis. Onc. 8, 50 (2024). https://doi.org/10.1038/s41698-024-00551-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41698-024-00551-8
- Springer Nature Limited