Introduction

The second most common cause of death worldwide is stroke, which is responsible for 11% of total deaths according to the WHO1. Strokes can be divided into hemorrhagic and ischemic strokes. Hemorrhagic strokes are caused by ruptured arteries while ischemic strokes are caused by clots or other obstructions/occlusions within arteries2. Approximately 60% of all acute strokes are associated with atherosclerotic disease in the common carotid artery3. Atherosclerotic lesions are most commonly found at branch points and bifurcations of blood vessels. Carotid artery atherosclerosis typically affects the carotid bifurcation4. Several noninvasive imaging techniques are currently used to evaluate atherosclerotic carotid artery stenosis5,6,7,8,9, including MRI7, CT8, and Doppler ultrasound9. CT angiography is a robust technique for assessing calcification and quantifying atherosclerosis load in the carotid arteries10.

Panoramic radiography (PR) is a standard routine imaging technique in initial examination in dental clinics. Several studies have described the usefulness of PR as a screening tool to diagnose carotid artery calcification (CAC) and investigated the prevalence of CAC in various populations11,12,13,14. CAC evaluation by PRs could aid in the identification of patients at high risk of cerebrovascular accident15. However, distinguishing CAC on PRs from other calcified anatomic structures or pathoses is difficult, and subject to subjective interpretation16. Even dental professionals with appropriate training find it challenging to accurately diagnose CAC in PR as it requires considerable experience and expertise17. Consequently, conventional diagnosis of CAC from PRs has limited accuracy. However, the introduction of an automatic assistance method could mitigate inter-examiner variability and facilitate a more reliable and precise evaluation of CAC on PRs18.

Deep learning-based approaches have gained importance in medical image analysis tasks such as classification, detection, and segmentation19. To detect CAC on PRs, Kats et al. applied Faster R-CNN with data augmentation by flip**, rotating, and changing the brightness to overcome the small size of their dataset, and reported a sensitivity of 0.75 and specificity of 0.8020. The performance of InceptionResNetV2, DenseNet169, and EfficientNetV2M models for detecting CAC from PRs was compared, and transfer learning was used to address the small number of image samples; the best-performing model had a sensitivity of 0.82 and specificity of 0.9721. There have been several studies on the automatic segmentation of CAC from MR, CT, and ultrasound images using deep learning. A two-stage deep learning-based method was proposed for segmentation of carotid atherosclerotic plaques in multi-weighted MR images22. A deep learning-based method was developed to segment CAC in CT angiography23. A deep learning network combining U-Net and DenseNet was used to segment carotid plaques in ultrasound images24. Segmentation of the CACs has proven beneficial for quantitatively assessing cardiovascular disease risk by providing detailed information about the size, location, and shape of the CACs. However, to date, no studies have investigated automatic segmentation of CAC lesions on PRs using deep learning.

Providing accurate ground truth data is essential for obtaining high performance in deep learning models. However, accurate labeling of the ground truth for CAC lesions on 2D projection PRs poses a challenge in the automatic segmentation of CAC using deep learning. This challenge arises from the difficulty in distinguishing between CAC lesions and surrounding anatomical structures with precision, compounded by the large pathological variations exhibited by the lesions. Therefore, the purpose of this study was to automatically and robustly classify and segment the CACs overlap** with anatomical structures and showing large variations in size, shape, and location on PR using a cascaded deep learning network. The model consisted of cascaded deep learning networks of classification for normal or abnormal PRs, and segmentation of CAC lesions in abnormal images only to provide accurate and robust segmentation of CACs in PRs. Our main contributions are as follows. (1) We developed a cascaded deep learning network (CACSNet) for automatic robust classification and segmentation of CACs on PRs, trained on accurate ground truth data with reference to CT images. (2) We employed various backbones and the Tversky loss function with optimized weights by balancing between precision and recall to improve our network’s CAC segmentation performance.

Materials and methods

Data acquisition and preparation

We obtained 400 panoramic radiographs (PRs) retrospectively from patients (242 females and 158 males; mean age 71.01 ± 7.96 years) who visited Seoul National University Dental Hospital from 2009 to 2022. PRs were collected from the digital panoramic machines of OP-100 (Instrumentarium Dental, Nahkelantie, Finland) with dimensions of 1976 × 976 pixels under condition of 70 kVp and 8–10 mA. In addition, PRs with dimensions of 2988 × 1468 pixels (resized as 1976 × 976 pixels) were obtained from Ray-alpha (Ray, Hwaseong-si, Korea) at 71 kVp and 12–14 mA. CT reference images were obtained using MDCT (Somatom Sensation 10, Siemens AG, Erlangen, Germany) under condition of 120 kVp and 130 mA. These images had voxel dimensions of 0.469 × 0.469 × 0.5 mm3, 512 × 512 pixels, and a 16-bit depth. In addition, another MDCT (Somatom Definition Edge, Siemens AG, Erlangen, Germany) was used to acquire images with voxel sizes of 0.37 × 0.37 × 0.6 mm3, dimensions of 512 × 512 pixels, and 12-bit depth at 120 kVp and 120 mA. This study was performed with approval from the institutional review board of Seoul National University Dental Hospital (ERI23031). The ethics committee approved a waiver of informed consent because this was a retrospective study. The study was performed in accordance with the Declaration of Helsinki.

A single PR for each patient was used for analysis. To accurately establish the ground truth for the presence and region of carotid artery calcification (CAC) on the PR, ground truth data was determined with reference to the CT image of the same patient, as CT is the standard imaging tool used to identify and quantify cardiovascular calcification25, and is superior to other imaging modalities for visualization of calcifications26 (Fig. 1). Inclusion criterion was the availability of a CT scan taken within one year before or after the PR. The PR was considered abnormal if calcification in the carotid artery was evident on both the PR and CT image, and as normal if no calcification in the carotid artery was observed on either the PR or CT image. Exclusion criteria were (a) inadequate image quality for diagnostic purposes on either the PR or CT image; (b) significant distortion or obscuration of the carotid artery region in the PR; and (c) extensive surgical procedures carried out near the area of interest. The data analysis of ROIs and CAC lesions is given in Supplementary Fig. S1.

Figure 1
figure 1

(a) Panoramic radiograph (PR) and (b) CT axial image for a patient. CT images were used as reference for CAC annotation on the PR. Yellow areas indicate carotid artery calcification lesions on the PR. Red dashed circles represent carotid artery calcification on the CT image.

Half of the 400 PRs were found to be abnormal cases with CAC at the left and/or right regions, while the other half were determined to be normal cases without CAC at the left or right regions (Table 1). An oral radiologist with more than ten years of experience manually annotated the CAC on PRs based on the CT reference image to ascertain the shape, location, and size of these for classification and segmentation using software (Supervisely OU, https://supervisely.com, Estonia) (Fig. 1). Two other oral radiologists with more than two decades of experience verified the annotated images with reference to CT images (Fig. 1). Left and right regions of interest (ROI) including CAC lesions were automatically cropped to 512 × 512 pixels at the bottom left and bottom right of the image, respectively. The right regions were horizontally flipped to match the left (Supplementary Table S1). The crop** size of the ROI was determined empirically to include the various CAC lesions. We applied Min–Max normalization techniques to the cropped images. The dataset was randomly divided into training, validation, and test sets at a 3:1:1 ratio (Table 1). Overall procedures are illustrated in Fig. 2.

Table 1 Dataset configuration of normal and abnormal panoramic radiographs used for deep learning.
Figure 2
figure 2

Overall procedures of the proposed method. (a) Data collection and carotid artery calcification (CAC) labeling with reference to CT image; (b) automatic ROI crop** and matching to the left; (c) training of the CAC classification network; (d) training of the CAC segmentation network; (e) prediction and evaluation processes of CACSNet.

Network architecture of CACSNet

We designed a cascaded deep learning network (CACSNet) that performs both classification and segmentation of CAC lesions (Fig. 2). The classification and segmentation networks for CACs are shown in Fig. 3. In the classification network of the CACSNet, PR images were classified as normal or abnormal images using the following CNN-based backbones: VGG1640. We used Grad-CAM to interpret the decision-making processes of the CAC classification network. Grad-CAM generated heatmaps of regions that deep learning models focused on when making a prediction40. Grad-CAM calculated gradients of the predicted class score on feature maps of the final convolutional layer and weighted summed them to provide a heatmap of the regions that contributed the most to the classification output40.

Results

Classification and segmentation performances of the networks were evaluated using a test set not used for training. We tested several CNN-based backbones, namely VGG16, MobileNet v2, ResNet101, DenseNet121, and EfficientNet-B4 in the CAC classification network. We also evaluated several backbones as encoders in the CAC segmentation network, namely VGG16, ResNet101, DenseNet121, SENet, and EfficientNet-B4. For comparison experiments, all networks were trained with the same data augmentations under the same computational environments to guarantee a fair comparison.

Table 2 shows the quantitative results of CAC classification performance according to backbones. CACSNet with EfficientNet-B4 had fewer false positives and negatives than CACSNet with the other backbones (Fig. 4), and achieved the highest accuracy, sensitivity, and specificity values of 0.985, 0.980, and 0.988, respectively (Table 2). CACSNet with EfficientNet-B4 and MobileNet v2 obtained the highest AUC value of 0.996 for classification (Fig. 5). There was a significant difference in AUC between VGG16 and MobileNet v2 (p < 0.05) and between VGG16 and EfficientNet-B4 (p < 0.05). To interpret the decision-making processes of CACSNet, Grad-CAM was used to visualize activation regions that contributed the most to the classification output. CACSNet with EfficientNet-B4 focused more densely on CAC lesions than CACSNet with the other backbones (Fig. 6). Heatmaps for EfficientNet-B4 showed more sensitive and robust activation on CACs with large variations in shape (Fig. 6a), size (Fig. 6b), and locations (Fig. 6c), whereas those with the other backbones showed sparse or wide activations in irrelevant anatomical structures such as near the hyoid bone, cervical spine, and mandible. Heatmaps of EfficientNet-B4 also focused more densely on specific regions of CACs that overlapped with surrounding anatomical structures than the other backbones (Fig. 6d–f). In Fig. 6, sparse and wider activation regions in the heatmaps, predicted as false negatives by the other backbones, were observed at irrelevant anatomical structures. As a result, CACSNet with EfficientNet-B4 provided classification outputs with a more detailed, precise, and accurate focus on CACs with large variations in size, shape, and location, and those overlapped with surrounding anatomical structures than the other networks.

Table 2 Classification performance of CACSNet with different backbones for carotid artery calcification.
Figure 4
figure 4

Confusion matrices for classification of carotid artery calcification by CACSNet with backbones of (a) VGG16, (b) MobileNet v2, (c) ResNet101, (d) DenseNet121, and (e) EfficientNet-B4. The false positives and negatives by EfficientNet-B4 are shown in the Supplementary Fig. S2.

Figure 5
figure 5

Receiver operating characteristic (ROC) curves for classification of carotid artery calcification by CACSNet with backbones of VGG16, MobileNet v2, ResNet101, DenseNet121, and EfficientNet-B4. There was a significant difference in AUC between VGG16 and MobileNet v2 (p < 0.05) and between VGG16 and EfficientNet-B4 (p < 0.05).

Figure 6
figure 6

Grad-CAM results of classification by CACSNet with backbones of VGG16, MobileNet v2, ResNet101, DenseNet121, and EfficientNet-B4. Bright red indicates that the corresponding region contributes strongly to the decision of the model. Grad-CAM results for (ac) CACs with large variations in size, shape, and location, and (df) those overlap** with surrounding anatomical structures. TP and FN indicate true positives and false negatives, respectively, after classification. The ground truth represents the original image with annotation (red line) for each case.

To determine optimal \(\alpha \) and \(\beta \) weights of Tversky loss in the segmentation network of CACSNet, an ablation study was performed by asymmetrically changing values between the two weights. Table 3 shows the CAC segmentation performance when adjusting \(\alpha \) and \(\beta \) weights of Tversky loss in CACSNet with EfficientNet-B4. As the \(\alpha \) weight gradually increased, the precision value increased, but the recall value decreased. As the \(\beta \) weight gradually increased, the recall value increased, but the precision value decreased. These results indicate a trade-off between precision and recall according to the \(\alpha \) and \(\beta \) weight values of Tversky loss. With weights of \(\alpha =0.6\) and \(\beta =0.4\), CACSNet with EfficientNet-B4 achieved the highest JI and DSC values of 0.595 and 0.722, respectively, while maintaining a balance between precision and recall. An additional ablation study was performed by changing loss functions including BCL, Jaccard index loss (JIL), DSL, and TVL in the segmentation network (Table 4). TVL achieved the highest JI and DSC values compared with the other loss functions in CACSNet with EfficientNet-B4.

Table 3 Ablation study of \(\alpha \) and \(\beta \) weights of Tversky loss of CACSNet with EfficientNet-B4.
Table 4 Ablation study for loss functions of CACSNet with EfficientNet-B4.

Table 5 shows the CAC segmentation performance of CACSNet with different backbones. CACSNet with EfficientNet-B4 demonstrated the best segmentation performance (JI, DSC, precision, and recall of 0.595, 0.722, 0.749, and 0.756, respectively). Compared to CACSNet with the other backbones, CACSNet with EfficientNet-B4 showed better segmentation results with fewer false positives and negatives for CAC lesions with large variations in size, shape, and location (Fig. 7a–c), and those that overlapped with surrounding anatomical structures (Fig. 7d–f). As a result, CACSNet with EfficientNet-B4 more accurately predicted CAC lesions with large morphological variations and overlaps. Nonetheless, CACSNet showed false positive errors for CACs near the hyoid bone (Fig. 8a) and the thyroid cartilage (Fig. 8b), and false negatives for CACs that overlapped with the cervical spine (Fig. 8c) and the posterior pharyngeal wall (Fig. 8d). We showed the segmentation performance of CACSNet with EfficientNet-B4 according to the size and mean pixel values of CAC lesions in Supplementary Fig. S3.

Table 5 Segmentation performance of CACSNet with different backbones for carotid artery calcification.
Figure 7
figure 7

Segmentation results by CACSNet with backbones of VGG16, ResNet101, DenseNet121, SENet, and EfficientNet-B4. Blue, green, and yellow regions indicate false negatives, false positives, and true positives, respectively. Segmentation results for (ac) CACs with large variations in size, shape, and location, and (df) those overlap** with surrounding anatomical structures. The ground truth represents the original image with annotation (red line) for each case.

Figure 8
figure 8

Segmentation errors predicted by CACSNet with different backbones. Blue, green, and yellow regions indicate false negatives, false positives, and true positives, respectively. (a) CACSNet predicted the hyoid bone (green arrow) and the thyroid cartilage (red arrow) as CAC (false positives), (b) CACSNet predicted the thyroid cartilage (red arrow) as CAC (false positives), (c) CACSNet could not accurately predict CAC regions (yellow arrow) overlap** with the cervical spine (false negatives), (d) CACSNet could not accurately predict CAC regions (blue arrow) overlap** with the posterior pharyngeal wall (false negatives). The ground truth represents the original image with annotation (red line) for each case.

Bland–Altman plots of area differences between the ground truth and segmentation predictions from CACSNets with different backbones revealed that EfficientNet-B4 showed higher linear relationships and better agreement limits than the other backbones and presented consistent segmentation performance across various sizes of CACs (Fig. 9). EfficientNet-B4 also showed smaller variation in segmentation performance than the other backbones when DSC values were plotted according to horizontal or vertical locations of CACs (Fig. 10). Furthermore, the classification network of CACSNet improved segmentation performance by reducing false negative CAC segmentation results (Table 6). Therefore, CACSNet with EfficientNet-B4 was robust to large morphological variations and overlap of CAC lesions with anatomical structures over the entire posterior inferior region of the mandibular angle.

Figure 9
figure 9

Bland–Altman plots of the ground truth and segmentation results by CACSNet with backbones of (a) VGG16, (b) ResNet101, (c) DenseNet121, (d) SENet, and (e) EfficientNet-B4. Blue dots are area differences (pixels) between the ground truth and segmentation results. Black dashed and red lines are 95% limits of agreement and mean difference, respectively.

Figure 10
figure 10

Line plots for Dice similarity coefficient score (DSC) of segmentation results by CACSNet with backbones of VGG16 (black), ResNet101 (cyan), DenseNet121 (green), SENet (blue), and EfficientNet-B4 (red) according to (a) CAC horizontal locations (from the antegonial notch of the mandible to the cervical spine); DSC values are plotted according to x coordinates of the center of mass for the ground truth image, (b) CAC vertical locations (from the cervical spine C5 to C2); DSC values are plotted according to y coordinates of the center of mass for the ground truth image.

Table 6 Segmentation performance of CACSNet with EfficientNet-B4 with and without a classification network.

Discussion

Stroke is the second-leading cause of death worldwide1. A significant percentage of stroke is caused by carotid artery atherosclerosis; early detection of carotid artery atherosclerosis can, therefore, reduce the risk of stroke3,41. PR is a routinely used imaging technique in dental practice that can be used to visualize calcified atheromas of the carotid artery15. Therefore, PR could potentially be used to screen for and detect non-symptomatic CAC patients among the general population42. However, the diagnostic capacity of conventional PRs is not sufficient for reliable and precise evaluation of CAC17,43. In addition, dentists may not pay close attention to structures outside the oral cavity such as the carotid arteries. As a result, PR is not commonly utilized for the purpose of screening for CAC in the oral healthcare setting. If CAC can be accurately diagnosed on PR with the help of an automatic method of detection (or classification) and segmentation, dental professionals can screen patients at risk of stroke more easily, and ensure that patients receive appropriate treatment44. Therefore, we developed CACSNet for automatic robust classification and segmentation of CACs with large variations in size, shape, and location, as well as CACs overlap** with surrounding anatomical structures.

Unlike images commonly used for computer vision tasks, PRs are consistently captured with patients positioned at a specific location and angle, and consequently, structures on PRs tend to exhibit consistent patterns in similar locations45. Despite this, CAC is characterized as an irregular nodular radiopacity adjacent to the cervical vertebrae, close to the intervertebral space of C3 and C4, and positioned posteroinferiorly to the mandibular angle and the hyoid bone in the context of PR46, however, their locations are variable and can be outside of the region covered by the PR47. Distinguishing CAC with certainty is a difficult task and providing accurate ground truth labeling of lesions in 2D projection images (i.e., PRs) makes applications of deep learning for segmentation challenging because it is difficult to accurately discriminate lesions from other surrounding anatomical structures. Furthermore, CACs vary widely in their size, shape, and location within the carotid arteries, as well as stage and type of calcification. CACs appear in various and undefined shapes, such as irregular, heterogeneous, verticolinear radiopacity in PR16, and are mostly circular when small and linear or thin rectangular when enlarged48. As plaque size increases, the calcification area also expands49, typically ranging from 1.5 to 4.0 cm50. Sometimes, the CAC is very small and cannot be identified on a PR43. Although differential diagnosis is conducted based on the locations and morphologies of structures, the ability to diagnose CACs by conventional PRs is limited.

Differential diagnosis of CAC involves distinguishing both anatomical and pathological radiopacities. Anatomical structures include the hyoid bone, styloid process, stylohyoid ligament, stylomandibular ligament, thyroid cartilage, triticeous cartilage, epiglottis, and anterior tubercle of the atlas vertebrae16. Pathological structures encompass lymph nodes, phleboliths, submandibular salivary glands sialoliths, loose body, and tonsilloliths16. Discriminating the CAC from these structures on PRs remains challenging as overlap with other structures leads to increased uncertainty when labeling the image. To improve the performance of deep learning models, it is imperative that the ground truth data used for model training be highly accurate. Previous studies have attempted to overcome these challenges through collaborative efforts among multiple researchers20,21. In this study, we improved labeling accuracy by using only abnormal images that were unequivocally identified as containing CAC in CT images. CT is recognized as the most advanced non-invasive tool for detecting vascular calcification and is a clinically accessible imaging technique51. It is considered the standard imaging tool for the identification and quantification of cardiovascular calcification25, and surpasses other imaging modalities for visualizing calcification26. Providing accurate ground truth data is essential for guaranteeing the high performance of deep learning models.

Before the widespread application of deep learning, traditional approaches were used for the automatic detection of CAC on PRs. Fuzzy image contrast enhancement and an algebraic image operator were used to extract a calcification region brighter than its neighbor; the detection rate of this method was 50%52. One study incorporated a support vector machine to reduce misdetection, and the number of false positive cases decreased to a 75% level of the previous method53. A method for detecting CACs using a top-hat filter showed a sensitivity of 93.6% with 4.4 false positives per image by reducing the number of false positives with a rule-based approach and support vector machine54. Some researchers have attempted to automate the detection of CAC in PRs based on deep learning networks20,21. Deep learning models achieved an AUC of 0.83, accuracy of 0.83, sensitivity of 0.75, specificity of 0.8020 or accuracy of 0.94, sensitivity of 0.82, and specificity of 0.9721. Our deep learning model with the EfficientNet-B4 backbone achieved an AUC of 0.996, accuracy of 0.985, sensitivity of 0.980, and specificity of 0.988 for CAC classification, demonstrating superior performance to previous models.

In the context of CAC segmentation on PRs, our research is pioneering, and direct benchmarks for comparison are therefore lacking. CACSNet with EfficientNet-B4 backbone achieved a JI of 0.595, DSC of 0.722, precision of 0.749, and recall of 0.756. Several investigations of CAC segmentation in images other than PRs using deep learning models have been conducted22,23,24,55,56,57. The deep learning models achieved a DSC of 0.795 in CT angiographic images23, a DSC of 0.9381 in ultrasound images24, and a DSC of 0.78, precision of 0.76, and recall of 0.8 in MR images22. Therefore, compared with previous studies performed on other modalities, CACSNet demonstrated comparable segmentation performance for CACs on PRs.

In this study, we applied a deep learning model using U-Net architecture in CACSNet for the segmentation of CAC in PRs. U-Net architecture includes a contracting path that captures contextual information and a symmetric expanding path that allows for precise localization of segmented areas58. We tested five validated networks (VGG1669, and those overlap** with the cervical spine and posterior pharyngeal (false negatives). Second, CACSNet may not be widely generalizable due to the use of PRs acquired from a single institution. It is necessary to improve generalizability using large PR datasets acquired under various imaging conditions from multi-centers using various devices. Last, we designed a cascaded deep learning model consisting of classification and segmentation networks; this model requires more computational time than single-stage deep learning models. In future studies, we intend to combine classification and segmentation networks in a single-stage deep learning model based on multi-task learning.

Conclusions

We developed a cascaded deep learning network (CACSNet) to automatically and robustly segment CAC in PRs; ground truth labeling was accurately determined with reference to CT images. CACSNet demonstrated the robustness of classification and segmentation of CAC lesions to large morphological variations and overlaps with surrounding structures over the entire posterior inferior region of the mandibular angle in PRs. Accurate diagnosis of CAC in PRs using an automatic method with high precision can help dental professionals to screen patients at risk of stroke more easily. Future studies should focus on improving the segmentation performance of our method for CAC in PRs by employing advanced deep learning models and data augmentation techniques based on generative AI, along with incorporating more extensive datasets from multiple centers and devices.