Rapid diagnosis of celiac disease based on plasma Raman spectroscopy combined with deep learning

Shi, Tian; Li, Jiahe; Li, Na; Chen, Cheng; Chen, Chen; Chang, Chenjie; Xue, Shenglong; Liu, Weidong; Reyim, Ainur Maimaiti; Gao, Feng; Lv, **aoyi

doi:10.1038/s41598-024-64621-4

Rapid diagnosis of celiac disease based on plasma Raman spectroscopy combined with deep learning

Article
Open access
Published: 01 July 2024

Volume 14, article number 15056, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Rapid diagnosis of celiac disease based on plasma Raman spectroscopy combined with deep learning

Download PDF

Tian Shi¹^na1^nAff3,
Jiahe Li²^na1,
Na Li^1,3,
Cheng Chen²,
Chen Chen²,
Chenjie Chang²,
Shenglong Xue^1,3,
Weidong Liu^1,3,
Ainur Maimaiti Reyim^1,3,
Feng Gao^1,3 &
…
**aoyi Lv²

Abstract

Celiac Disease (CD) is a primary malabsorption syndrome resulting from the interplay of genetic, immune, and dietary factors. CD negatively impacts daily activities and may lead to conditions such as osteoporosis, malignancies in the small intestine, ulcerative jejunitis, and enteritis, ultimately causing severe malnutrition. Therefore, an effective and rapid differentiation between healthy individuals and those with celiac disease is crucial for early diagnosis and treatment. This study utilizes Raman spectroscopy combined with deep learning models to achieve a non-invasive, rapid, and accurate diagnostic method for celiac disease and healthy controls. A total of 59 plasma samples, comprising 29 celiac disease cases and 30 healthy controls, were collected for experimental purposes. Convolutional Neural Network (CNN), Multi-Scale Convolutional Neural Network (MCNN), Residual Network (ResNet), and Deep Residual Shrinkage Network (DRSN) classification models were employed. The accuracy rates for these models were found to be 86.67%, 90.76%, 86.67% and 95.00%, respectively. Comparative validation results revealed that the DRSN model exhibited the best performance, with an AUC value and accuracy of 97.60% and 95%, respectively. This confirms the superiority of Raman spectroscopy combined with deep learning in the diagnosis of celiac disease.

Introduction

Celiac Disease (CD) is an autoimmune digestive system disorder characterized by impaired fat digestion or absorption, resulting in the excretion of substantial amounts of fat and giving stools a milky appearance¹. Under normal circumstances, the digestive system efficiently breaks down fats, allowing for absorption and transportation into the body. However, in CD patients, this process is disrupted, preventing adequate fat absorption. The presence of large amounts of unabsorbed fats in the intestines can lead to irritation, potentially causing diarrhea. Additionally, the loss of essential nutrients such as fats, proteins, and fat-soluble vitamins may result in malnutrition, leading to various health issues². CD can impact growth and development and compromise the immune system, increasing the risk of infections and other diseases. Individuals with CD commonly experience gastrointestinal symptoms such as diarrhea, abdominal pain, and bloating, significantly affecting their quality of life³.

Research indicates that early diagnosis and treatment of the disease can effectively slow down its progression. Therefore, establishing a rapid and accurate diagnostic method is of paramount importance for achieving early detection of CD and reducing associated damages⁴. Currently, the diagnosis and classification of CD depend on factors such as patient medical history, physical examinations, laboratory findings, and radiological evidence⁵. Diagnostic methods often involve examining fecal fat content⁶ and conducting blood and intestinal mucosal biopsies⁷. However, diagnosing CD remains challenging due to its symptoms being subtle or mistaken for other gastrointestinal issues. The most common reasons for CD screening include abdominal bloating and diarrhea, but these symptoms may not always be evident. This difficulty in diagnosis can lead to delayed or incorrect treatment, exacerbated by significant variations in the course of CD among patients. Some may exhibit mild symptoms, while others experience noticeable clinical manifestations, adding complexity to accurate diagnosis^8,9. Early detection of CD is crucial, as timely intervention and treatment can significantly improve patients' quality of life and slow down disease progression.

As a rapid spectral analysis technique, Raman spectroscopy (RS) can measure various biomolecules present in plasma samples, including proteins, nucleic acids, carbohydrates, and lipids¹⁰. Intensity differences between Raman peaks are primarily attributed to nucleic acids, amino acids, and lipids, which play crucial roles in biochemical reactions such as biological transformations, immune process monitoring, signal transduction, and nutrient metabolism. Raman spectroscopy’s ability to capture a wealth of information from multiple peaks, each representing specific substances and their intensities and positions, makes it the “biological fingerprint” region of sample Raman spectra¹¹.

Despite the significant achievements of Raman spectroscopy combined with machine learning models in disease diagnosis¹², the technique has limitations, such as a low signal-to-noise ratio¹¹. This limitation may hinder the intuitive identification of differences between spectra, potentially resulting in lower diagnostic performance. Therefore, exploring spectral differences through intelligent methods holds substantial significance.

Various chemometric techniques, including Principal Component Analysis (PCA), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN), have been extensively applied in spectral analysis^13,14. However, for extracting more features and achieving diagnostic requirements, more complex deep learning models are needed. Convolutional Neural Network (CNN) is one of the most popular foundational deep learning frameworks, reducing parameter numbers and improving feature extraction quality through local connections and parameter sharing. Traditional CNN models have demonstrated high performance and robustness in processing raw spectral data.

Studies by Yang et al.¹⁵ and Wu et al.¹⁶ showcase the effectiveness of one-dimensional CNNs in accurately classifying plasma lesions of tongue squamous cell carcinoma and rapidly diagnosing sparganosis using plasma Raman spectra, achieving accuracies of 94.90%. However, traditional CNN models can only extract local features at one scale, and the spectral measurement process is often susceptible to strong noise interference, making high-quality feature extraction more challenging. Additionally, differences arising from noise between training and testing sets may decrease spectral classification accuracy.

To address these challenges, this study constructs and adopts four different neural network models: CNN, Multi-Scale Convolutional Neural Network (MCNN), Residual Network (ResNet), and Deep Residual Shrinkage Network (DRSN). These models are trained end-to-end, eliminating the tedious process of manually extracting features. The models can automatically learn critical features from spectral data, enhancing generalization capabilities. By extracting features at different scales, the models effectively capture local information within the spectra, crucial for diagnosing celiac disease. Moreover, these models reduce noise interference during spectral measurement, further enhancing the neural network models' generalization capabilities and diagnostic accuracy for celiac disease.

Materials and methods

Experimental materials

In this study, we employed a pipette to collect 50 μL of plasma samples on tin foil-coated slides. After being partially dried in the air at room temperature (22 °C), but not completely dry, data were collected using a high-resolution confocal Raman spectrometer (Gora Raman Spectroscopy, Ideaoptics, China). The excitation wavelength was 785 nm from a YAG laser, 15-s integration time, and laser power of 160 mW. Continuous acquisition mode was set to measure the Raman spectra of plasma samples in the range of 500–2500 cm⁻¹. The laser beam was focused on the sample surface through a 50× lens, and three Raman spectra data were measured for each individual sample. Other spectral measurement conditions included an 8-s integration time, three integrations, five iterations, and 64 baseline points. And the spectral resolution of our spectrometer is 6.06 cm⁻¹. The celiac disease dataset tests three times at different positions on each sample. The study included 30 healthy control samples and 29 celiac disease samples, each measured three times, resulting in a total of 90 spectra for the control group and 87 for celiac disease.

In this study, the data were divided into training and testing sets. Since three spectra were collected for each patient sample, we averaged the spectra for each patient sample and conducted experiments using the averaged spectra. Therefore, after dividing the data into training and testing sets, we can ensure that the spectra of the same patient sample do not simultaneously exist in both the training and testing sets. All plasma samples were taken from fresh blood samples. The samples did not contain any anticoagulants and were collected in the morning after fasting for at least 12 h to avoid interference from dietary and other factors on the blood plasma components. We conducted correlation analysis of three samples from the same patient using the Pearson coefficient. We observed that the correlation between different patients has reached 0.99. The Pearson correlation coefficient for the correlation analysis among three samples of patient shown in the Fig. 1. The spectral intensity measurements across wavelengths for three randomly selected patients are shown in Fig. 2.

All samples were provided by the Autonomous Region People’s Hospital. The research protocol was approved by the Ethics Committee of the Autonomous Region People's Hospital (Approval Number: [(KY2023968173)]), and pathological examination of plasma confirmed the presence of Celiac Disease.

Data preprocessing

The airPLS algorithm was applied to remove background signals from the Raman spectra¹⁷, followed by the implementation of the Smoothing algorithm for noise elimination¹⁸. In the Smoothing algorithm, a window length of 5 and a polynomial order of 2 were utilized. Before normalization, to eliminate noise in the spectrum, we employed Fourier transformation for low-frequency filtering to remove the low-frequency components.

The train_test_split() method from the sklearn standard library was employed to partition the preprocessed Raman spectroscopy data into a training set and a testing set, with a ratio of 7:3. The training set consisted of 40 spectral data, while the testing set comprised 19 spectral data. Five-fold cross-validation was performed on the classification model, and the testing set results were used as the final evaluation metrics.

Model evaluation metrics

This study comprehensively assessed the performance of each model in the celiac disease classification diagnosis task using four parameters: accuracy, specificity, sensitivity, and precision. Accuracy represents the percentage of correctly predicted samples out of the total sample count, and the accuracy formula is as follows:

$$\begin{array}{*{20}c} {{\text{Accuracy}} = \left( {TP + TN} \right)/\left( {TP + TN + FP + FN} \right)} \\ \end{array}$$

(1)

where TP represents the count of samples correctly classified as positive in the positive sample class. TN represents the count of samples correctly classified as negative in the negative sample class. FP represents the count of samples incorrectly classified as positive in the negative sample class. FN represents the count of samples incorrectly classified as negative in the positive sample class. The formula for specificity is as follows:

$$\begin{array}{*{20}c} {{\text{Specificity}} = TN/\left( {TN + FN} \right) } \\ \end{array}$$

(2)

The formula for sensitivity is as follows:

$$\begin{array}{*{20}c} {{\text{Sensitivity}} = TP/\left( {TP + FN} \right) } \\ \end{array}$$

(3)

The precision formula is as follows:

$$\begin{array}{*{20}c} {{\text{Precision}} = {\text{TP/}}\left( {{\text{TP}} + {\text{FP}}} \right)} \\ \end{array}$$

(4)

AUC (Area Under the Curve) is a commonly used metric for assessing the performance of binary classification models. The AUC value represents the area between the True Positive Rate (also known as sensitivity or recall) and the False Positive Rate at different thresholds. It measures the model's ability to correctly distinguish between positive and negative instances at various classification thresholds. The typical AUC curve is the Receiver Operating Characteristic (ROC) curve, which plots the True Positive Rate against the False Positive Rate, illustrating the model's performance across different classification thresholds. A higher AUC value, closer to 1, indicates better classification performance, while a value close to 0.5 suggests that the model's performance is similar to random guessing. The basic structure of the confusion matrix is shown in Table 1.

Table 1 RS and SERS peak positions and vibrational mode assignments.

Full size table

Ethics statement

All experiments in this study were performed in strict accordance with relevant principles, regulations, and guidelines. All samples were provided by the People's Hospital of the Autonomous Region. The research protocol received approval from the Ethics Committee of the People's Hospital of the Autonomous Region (Approval Number: (KY2023968173)).This study confirmed that informed consent was obtained from all subjects and/or their legal guardians.

Results and analysis

Principal component analysis and interpretability

Considering the characteristics of the dataset and the requirements of the analysis task, for Raman spectral data, we retained 85% of the explained total variance in the principal components. Meanwhile, we used the explained variance ratio attribute to obtain the variance explained by each principal component. We standardized the spectral data using z-score and combined PCA to visualize the data distribution (as shown in Fig. 3), providing a clearer depiction of how the data spread in terms of spectral standard deviation and principal component analysis.

Figure 4a illustrates the projection direction of the data in the principal component space. Specifically, each point represents a feature (in this case, possibly the original features or PCA components), rather than a sample. The coordinates of each point represent the weights on the corresponding principal components. Therefore, the left plot helps us understand the contribution of original features to the principal components and the relationships between the components. Figure 4b displays the projection of each sample in the principal component space. Each point represents a sample, and its coordinates are its projection values on the principal components. This plot helps us observe the distribution of samples along the principal component directions and the distinctiveness between different categories.

To visualize the variance of the entire dataset, we plotted a graph (as shown in Fig. 5) displaying the standard deviation of the entire dataset.

In spectral analysis, each wavelength corresponds to a feature. The spectral standard deviation in Fig. 5 reflects the dispersion of data at each wavelength. By showcasing the distribution of spectral standard deviation, we can understand the variability of data at different wavelengths and the dispersion of data across the spectral range. This helps illustrate how each sample in the dataset spreads in terms of spectral standard deviation and principal component analysis. The contributions of each wavelength to the two principal components are shown in Fig. 6.

Through the aforementioned visualizations, we can observe the variation and distribution of data across different feature dimensions. However, for the utilization of advanced classification tools such as neural networks, more data features are required for training and classification. Therefore, we can see that the distribution of data after PCA is not ideal, which serves as one of the reasons for choosing more advanced classification tools like deep learning models.

Raman spectroscopy

The Raman spectra of plasma from patients with celiac disease (CD) are shown in Fig. 1, where the Raman characteristic peaks represent substances rich in lipids, proteins, nucleic acids, and amino acids in the tissue. Previous studies have indicated that changes in Raman peaks of proteins and nucleic acids may be observed in the plasma of diseased individuals, reflecting abnormal expression of cellular nucleic acids and proteins¹⁸. Additionally, CD patients exhibit higher levels of high-sensitivity C-reactive protein in their plasma, and in terms of the lipoprotein spectrum, CD patients show lower levels of high-density lipoprotein cholesterol (HDL-C)¹⁹. The serum of CD patients is characterized by lower levels of various metabolites (such as amino acids, lipids, ketones, and choline) (P < 0.01)²⁰. Comparative experiments have revealed that, in terms of lipids, the main differences between celiac disease patients and the control group are a decrease in cholesterol and phospholipids in both high-density lipoprotein and low-density lipoprotein in the former. These differences persist after treatment, and a lower level of cholesterol in very-low-density lipoprotein (VLDL) has also been observed²¹. Table 1 lists the major characteristic peaks of plasma in celiac disease, along with the assignment of each feature peak. As shown in Fig. 7, patients with celiac disease exhibit Raman peaks at 1402 cm⁻¹, 1477 cm⁻¹, 1518 cm⁻¹, 1545 cm⁻¹, 1715 cm⁻¹ and 1772 cm⁻¹, in their plasma, which are higher than those in normal controls. However, the peak at 1445 cm⁻¹ is lower than in normal controls. Significant differences exist between celiac disease patients and healthy controls in terms of functionality, tissue structure, and surface features in plasma. Specifically, the notable Raman peak difference at 1402 cm⁻¹ reflects differences in bending modes of methyl groups between the two groups, indicating potential abnormal lipid metabolism in celiac disease patients, such as damage to adipose tissue due to malabsorption of fat²². As shown in Table 2, The Raman peak at 1477 cm⁻¹ reflects calcium oxalate in the patient's plasma, exhibiting significant changes compared to healthy plasma²³. Celiac disease is an immune-related disease that may involve an abnormal immune response to proteins in the intestines. This may lead to observed Raman peak differences in celiac disease patients, reflecting changes in protein structure or composition. In celiac patients, changes in lipid and protein composition are related to alterations in cell membrane structure and function due to damage to the intestinal mucosa. Additionally, celiac disease is often accompanied by inflammation and the formation of immune complexes. These biological processes may cause changes in the intra- and extracellular environment, including the distribution and structure of lipids and proteins. The Raman peak difference at 1518 cm⁻¹ is attributed to differences in cytosine content. In celiac patients, the impact on nucleotides, including changes in concentration or structure, may occur due to intestinal damage. The expression level changes of phenylalanine are reflected in the Raman peak at 1545 cm⁻¹, indicating the metabolic status, redox balance, and regulation of some physiological functions. Differences in the Raman spectrum of C=O vibration at 1715 cm⁻¹ and 1772 cm⁻¹ are lipid-related, as celiac disease is a malabsorption disease. Therefore, if significant differences in C=O vibration are detected in celiac patients, it implies abnormal lipid metabolism or changes in lipid composition, which are related to the absorption and metabolism of fat in the intestines²².

Table 2 The major Raman bands and their corresponding assignments³⁵.

Full size table

Model evaluation

Convolutional neural network (CNN) model evaluation

Convolutional Neural Network (CNN) is a deep feedforward neural network with features such as local connections and weight sharing. As one of the representative algorithms of deep learning, CNN has significant advantages in complex machine learning problems such as image classification, computer vision, natural language processing^24,25,26,27, making it one of the most widely used models. The components of CNN include basic input and output layers, as well as convolutional layers, pooling layers, and fully connected layers²⁸. The convolutional layer is used to extract different features of the input data, which may only be able to extract some low-level features. Most convolution operations can iteratively extract more complex features from low-level features. Then, the pooling layer is used to reduce the dimensionality of the features, achieving feature invariance. As is shown in Fig. 8a, after multiple convolution and pooling operations, all local features are combined into global features in the fully connected layer. In this experiment, the CNN model mainly includes four Conv1D layers with 32, 64, 64, and 32 filters, as well as 2 neurons. A Dropout layer is added after each Dense layer to prevent the problem of model overfitting.

The ROC curve of the CNN is shown in Fig. 9. Compared to machine learning models, CNN shows improvement in classification accuracy, specificity, and sensitivity. However, CNN still makes errors in recognizing a considerable number of samples.

Multi-scale convolutional neural network (MCNN) evaluation

MCNN is a simple yet effective multi-scale convolutional neural network that can map the input to its corresponding density map²⁹. MCNN has stronger universality for input information. By using filters of different sizes with different receptive fields, the features learned by convolutional neural networks at different scales have stronger adaptability due to the perspective effect. The MCNN used in this experiment consists of Conv1d layers, LeakyReLu layers, pooling layers, and Conv1d layers. As is shown in Fig. 8b, three convolutional layers are used, with 16, 32, and 64 filters, and kernel sizes of 4, 8, and 16, respectively. The stride is 1, and “same” padding is used. MCNN outperforms the CNN model in accuracy, and the model's runtime is similar to CNN. The ROC curve of MCNN is shown in Fig. 9. From the confusion matrix, it can be seen that MCNN is more powerful in classifying positive samples, which is crucial for the diagnosis of celiac disease.

Evaluation of deep residual network (ResNet)

ResNet, as a powerful deep neural network structure, has been widely applied to disease assessment tasks³⁰. Its design of residual learning makes the network easier to train and enables deeper feature exploration in images. In disease assessment, ResNet can learn complex features and patterns in medical images, thereby improving the accuracy and robustness in disease diagnosis. The ResNet used in this experiment consists of multiple convolutional blocks, each including a convolutional layer and a batch normalization layer. It also incorporates multiple residual connection blocks, each containing two convolutional blocks and possible convolutional layers for shortcut connections. Dropout layers are added after each residual connection block to prevent overfitting (Fig. 8c). The ResNet model can capture deep features in images and identify potential pathological information. Its structure of direct connections between layers enables better information transmission, alleviating the vanishing gradient problem, and reducing the risk of overfitting. However, its performance on celiac disease spectral data is not superior to that of convolutional neural networks. The ROC curve of ResNet is shown in Fig. 9.

Evaluation of deep residual shrinkage network (DRSN)

The Deep Residual Shrinkage Network (DRSN), as a deep learning model, is particularly suitable for features related to noise. It effectively addresses noise and redundant information in spectra, enhancing its learning and feature extraction capabilities for disease features³¹. Built upon ResNet, DRSN introduces improvements by setting a threshold for each channel and incorporating two fully connected layers. As is shown in Fig. 8d, the second fully connected layer outputs neurons equal to the number of input feature map channels, and each neuron undergoes sigmoid activation. DRSN demonstrates significant advantages in handling spectral data³², as its residual block structure facilitates deeper exploration of disease features in plasma spectra. Additionally, the introduced shrinkage mechanism effectively suppresses noise in spectral data, enhancing the model's robustness.

By training on celiac disease and healthy control plasma samples, DRSN can learn spectral features related to the disease, achieving precise extraction of potential biomarkers. The design of its network structure allows information to flow between different levels, enabling the model to better capture complex relationships in plasma spectra. Moreover, DRSN's shrinkage mechanism helps reduce redundant information, improving the signal-to-noise ratio of spectral signals. The ROC curve of DRSN is shown in Fig. 9.

Classification results

Validation results for the CNN, MCNN, ResNet, and DRSN models show that the CNN and MCNN models perform well on the training and validation sets, with accuracies reaching 92.31% and 90.76%, respectively. However, the CNN's specificity is suboptimal at only 85.71%. ResNet exhibits the poorest performance across all metrics, with an accuracy of only 80.23% and specificity of only 68.57%, all have associated 95% confidence interval variance bands. In contrast, the DRSN model outperforms CNN, MCNN, and ResNet in accuracy, specificity, sensitivity, and precision. A crucial factor is the enhanced generalization capability of DRSN in combating noise. And we also compared these models with SVM and KNN. To increase the credibility of the experimental results, this study calculated five evaluation metrics, namely the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve, accuracy, sensitivity, specificity, and precision. Table 3 presents the evaluation metrics for the test sets of the four models after five-fold cross-validation.

Table 3 Raman spectral model classification results.

Full size table

Discussion

This study utilized Raman spectroscopy to acquire plasma spectra from patients with celiac disease, revealing differences in the expression of proteins, lipids, amides, and amino acids compared to normal plasma. These differences arise from substantial variances in cellular function and plasma structure between celiac disease and normal plasma cells^33,34. Leveraging these expression disparities is advantageous for establishing models used to evaluate the extent of differences between celiac disease plasma and true control plasma for classification purposes.

The Raman spectra data used in this study consisted of normal plasma data and celiac disease data, showcasing significant differences in the characteristic peaks of celiac disease plasma compared to normal plasma. Therefore, valuable information can be extracted from plasma Raman spectra data. Overall, among the deep learning models, the Deep Residual Shrinkage Network (DRSN), adept at handling noise to enhance signal-to-noise ratio, demonstrated higher accuracy. The proposed deep learning models, in conjunction with celiac disease Raman spectra data, advance technology in the Raman spectroscopy field and enrich diagnostic approaches for celiac disease.

In this study, we found that the Raman spectra of celiac disease and the healthy control group share most common peaks. To reduce noise, we applied Fourier transform for low-frequency component removal. Additionally, to overcome the drawbacks of low signal-to-noise ratio in Raman spectra, which can lead to low diagnostic performance, we established a Raman spectroscopy diagnostic model based on the Deep Residual Shrinkage Network (DRSN). CNN, ResNet, MCNN, and DRSN achieved high accuracy in disease diagnosis by extracting multiscale features from spectral data. Through a comparative analysis of the four deep learning models, this study observed a gradual improvement in the classification efficiency of celiac disease and healthy control group data, from simple two-layer convolution to complex multilayer convolution, and then to parallel multiscale convolution. The DRSN model effectively alleviated spectral noise issues and exhibited efficient classification performance. It successfully classified celiac disease with an accuracy of 92.3%, sensitivity and specificity both reaching 99.1%. From the analysis results, there were no significant differences in the content of proteins, fatty acids, and phospholipids in the plasma of celiac disease patients and the healthy control group. This may be related to the biological behavior of the disease. Deep learning models, by extracting these differing features, provide a theoretical basis for effective diagnostic classification.

The DRSN model successfully suppressed noise in Raman spectra through the introduction of deep residual shrinkage blocks, enhancing the model's generalization performance. Subsequently, we further explored the impact of scaling coefficients and soft thresholding operations in DRSN on model performance, demonstrating the crucial role of these mechanisms in enhancing the model's robustness and noise resistance. Moreover, there is a need for analysis on the interpretability of the model by visualizing the activation values and feature maps of deep learning models. This would aid in understanding the model's focus on different Raman peaks in the process of discriminating celiac disease, providing insights for further research.

In clinical research, it is crucial to include as much information as possible about patient participants. Raman spectroscopy is holistic and not targeted towards individual analytes, thus the entire biological fluid influences the detected signal. Therefore, we supplemented patient information such as age, gender, geographic location, etc. allowing for further investigation and comparison with future studies. This information can be included in the Table 4.

Table 4 The relevant patient information.

Full size table

In conclusion, this study provides strong empirica support for combining Raman spectroscopy and deep learning models for celiac disease diagnosis. Future work could expand the sample size, consider multicenter data to verify the model's robustness, and delve deeper into exploring the interpretation and discovery of potential biomarkers by deep learning models for celiac disease. This is crucial for advancing the translational application of spectroscopic diagnostic technology in clinical settings.

Conclusion

In the face of the limited information extracted from conventional plasma assays, the challenges of effectively distinguishing the highly similar spectra exhibited by celiac disease, which are difficult for the human eye to discern, and the presence of noise interference in spectral data, this study further confirmed the high efficiency of deep learning networks for extracting multi-scale features. Specifically, comparing the classification performance of four deep learning models on plasma Raman spectra of celiac disease patients and healthy control groups, this study demonstrates the effectiveness of deep learning networks for feature extraction.

For the DRSN model, its end-to-end learning approach allows direct learning of feature representations from raw spectral data without the need for manually designing feature extractors. This reduces the need for feature engineering and enhances the model's automation. The network structure of DRSN allows information to flow between different levels, enabling the model to learn multi-scale features. This is crucial for capturing spectral information at different levels and aids in comprehensive understanding and extraction of disease-related biomarkers. Through extracting multi-scale and multi-level features from spectral data, DRSN achieves non-invasive, rapid, and low-cost identification of celiac disease patients and healthy control group data.

Ultimately, our research results indicate that DRSN has achieved significant success in the classification and diagnosis of celiac disease and healthy controls. By comparing the performance of different models, our conclusion is that adopting DRSN can effectively improve the accuracy and robustness of disease diagnosis. Furthermore, due to its efficiency in handling spectral noise, DRSN excels in spectral data processing and disease monitoring tasks. It serves as a powerful tool for accurately and efficiently extracting disease features and conducting spectral data analysis. This valuable experience and guidance contribute to future research in celiac disease classification within the spectroscopy measurement field.

Data availability

The datasets generated and analyzed during the current study are not publicly available due to data privacy laws, but are available from the corresponding author on reasonable request.

References

Caio, G. et al. Celiac disease: A comprehensive current review. BMC Med. 17, 1–20 (2019).
Article CAS Google Scholar
Volta, U. et al. The changing clinical profile of celiac disease: A 15-year experience (1998–2012) in an Italian referral center. BMC Gastroenterol. 14(1), 1–8 (2014).
Article Google Scholar
Ferretti, G. et al. Celiac disease, inflammation and oxidative damage: A nutrigenetic approach. Nutrients 4(4), 243–257 (2012).
Article CAS PubMed PubMed Central Google Scholar
Laurikka, P. et al. Extraintestinal manifestations of celiac disease: Early detection for better long-term outcomes. Nutrients 10(8), 1015 (2018).
Article PubMed PubMed Central Google Scholar
Lomoschitz, F. et al. Enteroclysis in adult celiac disease: Diagnostic value of specific radiographic features. Eur. Radiol. 13, 890–896 (2003).
Article CAS PubMed Google Scholar
Comino, I. et al. Fecal gluten peptides reveal limitations of serological tests and food questionnaires for monitoring gluten-free diet in celiac disease patients. Am. J. Gastroenterol. 111(10), 1456 (2016).
Article ADS PubMed PubMed Central Google Scholar
Bascuñán, K. A. et al. A miRNA-based blood and mucosal approach for detecting and monitoring celiac disease. Dig. Dis. Sci. 65, 1982–1991 (2020).
Article PubMed Google Scholar
Rewers, M. Epidemiology of celiac disease: What are the prevalence, incidence, and progression of celiac disease?. Gastroenterology 128(4), S47–S51 (2005).
Article PubMed Google Scholar
Catassi, C. et al. Detection of celiac disease in primary care: A multicenter case-finding study in North America. Off. J. Am. Coll. Gastroenterol. 102(7), 1454–1460 (2007).
Article Google Scholar
Parachalil, D. R., McIntyre, J. & Byrne, H. J. Potential of Raman spectroscopy for the analysis of plasma/serum in the liquid state: Recent advances. Anal. Bioanal. Chem. 412, 1993–2007 (2020).
Article PubMed Google Scholar
Wang, X. et al. Fundamental understanding and applications of plasmon-enhanced Raman spectroscopy. Nat. Rev. Phys. 2(5), 253–271 (2020).
Article ADS Google Scholar
Ralbovsky, N. M. & Lednev, I. K. Towards development of a novel universal medical diagnostic method: Raman spectroscopy and machine learning. Chem. Soc. Rev. 49(20), 7428–7453 (2020).
Article CAS PubMed Google Scholar
Kaznowska, E. et al. The classification of lung cancers and their degree of malignancy by FTIR, PCA-LDA analysis, and a physics-based computational model. Talanta. 15(186), 337–345. https://doi.org/10.1016/j.talanta.2018.04.083 (2018).
Article CAS Google Scholar
Ghassemi, M. et al. Diagnosis of normal and malignant human gastric tissue samples by FTIR spectra combined with mathematical models. J. Mol. Struct. 1229, 129493. https://doi.org/10.1016/j.molstruc.2020.129493 (2021).
Article CAS Google Scholar
Yan, H. et al. Diverse region-based CNN for tongue squamous cell carcinoma classification with Raman spectroscopy. IEEE Access 8, 127313–127328 (2020).
Article Google Scholar
Wu, G. et al. Serum Raman spectroscopy combined with convolutional neural network for label-free detection of echinococcosis. J. Raman Spectrosc. 53(2), 182–190 (2022).
Article ADS CAS Google Scholar
He, S. et al. Investigation of a genetic algorithm based cubic spline smoothing for baseline correction of Raman spectra. Chemom. Intell. Lab. Syst. 152, 1–9 (2016).
Article CAS Google Scholar
Barton, S. J., Ward, T. E. & Hennelly, B. M. Algorithm for optimal denoising of Raman spectra. Anal. Methods 10(30), 3759–3769 (2018).
Article Google Scholar
Tetzlaff, W. F. et al. Markers of inflammation and cardiovascular disease in recently diagnosed celiac disease patients. World J. Cardiol. 9(5), 448 (2017).
Article PubMed PubMed Central Google Scholar
Bertini, I. et al. The metabonomic signature of celiac disease. J. Proteome Res. 8(1), 170–177 (2009).
Article CAS PubMed Google Scholar
Mediene, S. et al. Serum lipoprotein profile in Algerian patients with celiac disease. Clin. Chim. Acta 235(2), 189–196 (1995).
Article CAS PubMed Google Scholar
Sen, P. et al. Persistent alterations in plasma lipid profiles before introduction of gluten in the diet associated with progression to celiac disease. Clin. Transl. Gastroenterol. 10(5), 1–10 (2019).
Article PubMed Google Scholar
Singh, A. et al. Non-invasive biomarkers for celiac disease. J. Clin. Med. 8(6), 885 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zhou, W., Wang, H. & Wan, Z. Ore image classification based on improved CNN. Comput. Electr. Eng. 99, 107819. https://doi.org/10.1016/j.compeleceng.2022.107819 (2022).
Article Google Scholar
**e, Y., Zhang, J., Shen, C. & **a, Y. CoTr: Efficiently bridging CNN and transformer for 3D medical image segmentation. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2021 171–180 (Springer, 2021). https://doi.org/10.1007/978-3-030-87199-4_16.
Chapter Google Scholar
Bhatt, D. et al. CNN variants for computer vision: History, architecture, application, challenges and future scope. Electronics (Basel) 10(20), 2470. https://doi.org/10.3390/electronics10202470 (2021).
Article Google Scholar
Yeboah, P. N. & Baz Musah, H. B. NLP technique for malware detection using 1D CNN fusion model. Secur. Commun. Netw. 2022, e2957203. https://doi.org/10.1155/2022/2957203 (2022).
Article Google Scholar
Alzubaidi, L. et al. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 8(1), 53. https://doi.org/10.1186/s40537-021-00444-8 (2021).
Article PubMed PubMed Central Google Scholar
An, F., Li, X. & Ma, X. Medical image classification algorithm based on visual attention mechanism-MCNN. Oxid. Med. Cell. Longev. 2021, 1–12 (2021).
Google Scholar
Wu, Z., Shen, C. & Van Den Hengel, A. Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognit. 90, 119–133 (2019).
Article ADS Google Scholar
Zhao, M. et al. Deep residual shrinkage networks for fault diagnosis. IEEE Trans. Ind. Inform. 16(7), 4681–4690 (2019).
Article Google Scholar
Gong, Z. et al. RamanCMP: A Raman spectral classification acceleration method based on lightweight model and model compression techniques. Anal. Chim. Acta 1278, 341758 (2023).
Article CAS PubMed Google Scholar
Calam, J., Ellis, A. & Dockray, G. J. Identification and measurement of molecular variants of cholecystokinin in duodenal mucosa and plasma. Diminished concentrations in patients with celiac disease. J. Clin. Investig. 69(1), 218–225 (1982).
Article CAS PubMed PubMed Central Google Scholar
Douglas, A. P., Crabbé, P. A. & Hobbs, J. R. Immunochemical studies of the serum, intestinal secretions and intestinal mucosa in patients with adult celiac disease and other forms of the celiac syndrome. Gastroenterology 59(3), 414–425 (1970).
Article CAS PubMed Google Scholar
Chundayil Madathil, G. et al. A novel surface enhanced Raman catheter for rapid detection, classification, and grading of oral cancer. Adv. Healthc. Mater. 8(13), 1801557 (2019).
Article Google Scholar

Download references

Funding

This work was supported by The Central Leading Local Science and Technology Development Special Fund Project (Autonomous Region Science and Technology Department) (ZYYD2022A06).

Author information

Tian Shi
Present address: **njiang Clinical Research Center for Digestive Diseases, No. 91 Tianchi Road, Tianshan District, Urumqi, 830001, **njiang Uygur Autonomous Region, China
These authors contributed equally: Tian Shi and Jiahe Li.

Authors and Affiliations

Department of Gastroenterology, People’s Hospital of **njiang Uygur Autonomous Region, Urumqi, 830001, **njiang Uygur Autonomous Region, China
Tian Shi, Na Li, Shenglong Xue, Weidong Liu, Ainur Maimaiti Reyim & Feng Gao
College of Software, **njiang University, Urumqi, 830046, China
Jiahe Li, Cheng Chen, Chen Chen, Chenjie Chang & **aoyi Lv
**njiang Clinical Research Center for Digestive Diseases, No. 91 Tianchi Road, Tianshan District, Urumqi, 830001, **njiang Uygur Autonomous Region, China
Na Li, Shenglong Xue, Weidong Liu, Ainur Maimaiti Reyim & Feng Gao

Authors

Tian Shi
View author publications
You can also search for this author in PubMed Google Scholar
Jiahe Li
View author publications
You can also search for this author in PubMed Google Scholar
Na Li
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chenjie Chang
View author publications
You can also search for this author in PubMed Google Scholar
Shenglong Xue
View author publications
You can also search for this author in PubMed Google Scholar
Weidong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ainur Maimaiti Reyim
View author publications
You can also search for this author in PubMed Google Scholar
Feng Gao
View author publications
You can also search for this author in PubMed Google Scholar
**aoyi Lv
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Tian Shi, Jiahe Li, Feng Gao and **aoyi Lv designed the study. Tian Shi, Jiahe Li and Cheng Chen analyzed the data and wrote the original draft. Chen Chen, Shenglong Xue and Chenjie Chang designed the substrate. Weidong Liu and Ainur Maimaiti Reyim edited the draft. Feng Gao, **aoyi Lv was responsible for supervision, project management, and fund acquisition. All authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Feng Gao or **aoyi Lv.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Shi, T., Li, J., Li, N. et al. Rapid diagnosis of celiac disease based on plasma Raman spectroscopy combined with deep learning. Sci Rep 14, 15056 (2024). https://doi.org/10.1038/s41598-024-64621-4

Download citation

Received: 25 January 2024
Accepted: 11 June 2024
Published: 01 July 2024
DOI: https://doi.org/10.1038/s41598-024-64621-4
Springer Nature Limited

Rapid diagnosis of celiac disease based on plasma Raman spectroscopy combined with deep learning

Abstract

Introduction