Introduction

Honey, a natural sweetener formed by bees from the nectar of various types of flowers, has a composition that varies depending on factors such as botanical and geographical origin, as well as processing and storage conditions [1, 2]. Therefore, it is essential to develop reliable analytical methods for evaluating the quality and authenticity of honey. The components found in honey, including carbohydrates, amino acids, vitamins, and minerals, are crucial factors that determine its nutritional and therapeutic properties [2]. Honey sourced from different types of flowers possesses unique chemical constituents, including antimicrobial, anti-inflammatory, and antioxidant properties [2,3,4,5,6,7,8]. However, concerns have arisen about the quality and authenticity of honey due to common issues in the market, such as impurity and mislabeling [9].

In honey authentication research, conventional approaches are used to determine its botanical origin. Sensory and physicochemical assessments are employed to learn the provenance of monofloral honey [10]. At the same time, the customary practice involves the utilization of melissopalynological analysis for the microscopic scrutiny and identification of floral pollen grains within honey [11]. Nevertheless, it should be noted that melissopalynological methodologies may not be as pertinent when applied to other honey varieties due to the inherent variability in pollen content, which typically exhibits lower levels [12]. In addition, the melissopalynological analysis is an intricate process that requires a limited number of highly trained experts, possibly owing to the rigorous training needed for this specialized field [13].

The limitations of conventional methods for verifying the botanical and geographical origins of honey emphasize the necessity for more dependable and contemporary analytical approaches. Cutting-edge analytical instruments and sensor arrays, including chromatography [14], mass spectrometry (MS) [

Results and discussion

In this study, samples were obtained from two different honey sources (pine honey and multi-floral honey) from various regions. Initially, honey proteins were extracted from the samples, and then glycan release was performed, followed by labeling with 2-AA. Subsequently, purification of 2-AA labeled glycans was carried out. Glycans were analyzed using MALDI-MS. The peak areas for the identified glycans were extracted, and the relative abundance of each glycan was calculated. Finally, statistical and machine learning analyses were conducted.

As a result of the analyses, a total of 76 different N-glycan compositions were identified (Table 1). In Fig. 1, MALDI mass spectra for multi-floral and pine kinds of honey are presented. Out of these, 13 N-glycan profiles that were consistently present in experimental replicates were utilized in the statistical analyses. After determining the relative abundance of each glycan, statistically significant variations in glycan profiles were identified. A classical t-test was applied to detect statistically significant changes in glycan abundance between the two honey types. Significantly regulated N-glycans that contributed to the differentiation between pine and multi-floral honey are shown in Fig. 2. The volcano plot seen in Fig. 2A reveals that the glycans Hex5HexNAc3, Hex4HexNAc3, and Hex5HexNAc2 exhibited statistically significant differences between multi-floral and pine honeys. These glycans displayed notable differences in abundance and were found to be significantly higher in one honey type compared to the other. These specific glycans emerge as key discriminators facilitating the differentiation between pine and multi-floral honey. Specifically, the glycan Hex5HexNAc3 exhibited substantial up-regulation in multi-floral honey compared to pine honey. Additionally, Hex5HexNAc2 demonstrated the most pronounced variation between multi-floral and pine honey types.

Table 1 The list of the identified N-glycans belonging to pine and multi-floral honeys
Fig. 1
figure 1

MALDI-MS spectra of identified N-glycans derived from A pine honey and B multi-floral honey

Fig. 2
figure 2

The figure elucidates the graphical representations employed in the comparative analysis of N-glycans between pine honey and multi-floral honey. Firstly, A encompasses a volcano plot, visually depicting the distinctively regulated N-glycans between the two honey types. And, B displays a diagram derived from principal component analysis, providing a visual representation of the multivariate variations among the N-glycan profiles of the honey varieties (Color figure online)

The principal component analysis in Fig. 2B demonstrates the ability to distinguish between pine and multi-floral honey using the significant change of three N-glycans in the honey samples. Red dots represent glycans from pine honey samples, while blue dots represent multi-floral honey. In the two-dimensional analysis, component 1 explains 85.4% of the variance, and component 2 explains 14.1% of it. This analysis indicates a high differentiation between pine and multi-floral honeys. The differentiation between pine and multi-floral kinds of honey can be attributed to significant quantitative differences in N-glycan profiles for significantly changed profiles. The clustering of the samples observed in the chart demonstrates that the glycan compositions of the two honey types are pretty different. Component 1, capturing the most significant variance in the data, likely reflects the fundamental differences in glycan abundance between the two honey types. Similarly, Component 2 highlights additional differences contributing to the differentiation between pine and multi-floral honey.

In the machine learning analysis, the relative areas of the glycans were determined based on the mass spectrometric data. Subsequently, machine learning models were developed using MATLAB, employing a 5% cross-validation. The performance of all available models within the MATLAB classification application was comprehensively assessed. To evaluate the machine learning models’ effectiveness, various criteria were considered, including accuracy and the calculation of the area under the curve (AUC). Figure 3A in the study provides a detailed insight into the machine learning classification results based on 13 N-glycans obtained from pine and multi-floral glycoproteins. The findings from this figure indicate that the true positive score for predicting pine honey is an impressive 100%, signifying that all pine honey samples were accurately classified. On the other hand, there is a false positive rate of 12.5% in the classification of multi-floral honey. This indicates that while the model excels at classifying pine honey, there is room for improvement in distinguishing multi-floral honey samples. Overall, the classification accuracy of the model stands at 93.5%. Figure 3B presents the ROC curve analysis for this model, displaying the area under the curve (AUC). The obtained AUC value is 0.93, indicating a high level of classification performance (Fig. 3B). The fact that the ROC curve predominantly lies above the diagonal line signifies the model’s success in distinguishing between pine and multi-floral honey. The high AUC value further substantiates the effectiveness of accurately classifying samples based on the selected glycans.

Fig. 3
figure 3

The confusion matrix presented herein pertains to a machine-learning model employed in the discriminatory classification of pine honey versus multi-floral honey. The model’s training involved two distinct feature sets: A comprised 13 N-glycans, and C restricted to three N-glycans demonstrating significant alterations. Additionally, the receiver operating characteristic (ROC) curves, delineating the models’ performance characteristics, are illustrated in (B) for the 13 N-glycans dataset and D specifically for the subset of three significantly changed N-glycans

In Fig. 3C, a confusion matrix is presented for the machine learning classification results obtained using three statistically significant glycans (Hex5HexNAc3, Hex4HexNAc3, and Hex5HexNAc2). Remarkably, in this scenario, all pine honey and multi-floral honey samples were perfectly and precisely distinguished from one another, resulting in a 100% accuracy. This remarkable achievement in accurately discerning the two honey types underlines the discriminative capability of the three selected glycans. Figure 3D showcases the ROC curve analysis for the classification model employing these three statistically significant glycans. The ROC curve consistently remains positioned above the diagonal line, with an AUC value of 1.00 reported. This AUC value signifies an exceptionally high level of classification performance. This analysis unequivocally demonstrates the algorithm and model’s exceptional effectiveness in the precise classification of samples based on the selected glycans.

The literature extensively covers the proteomic characterization of various honey types. A recent study thoroughly examined the protein composition of 13 honeys, identifying a total of 130 proteins [25]. Utilizing a proteomic approach, honey from distinct geographical and botanical sources was clearly differentiated [26]. It has been understood that the proteomic profiles of honey types differ among honey types. However, there is currently a lack of literature on the N-glycans of honey glycoproteins. This study is the pioneer in utilizing MALDI-MS-based N-glycomics and machine learning for the classification of honey types.

The variation in glycosylation machinery across organisms is known. Unlike proteomes, there is no genetic template for the glycosylation machinery within cells [27]. It is emphasized that multiple genes play a role in the formation of glycans attached to proteins, rendering glycans highly intricate and site-, tissue- and species-specific [28]. In contrast to comprehensive proteomics experiments, N-glycan profiling can be accomplished through MALDI-MS. In this method, N-glycans extracted from pine and multi-floral honey were swiftly profiled and utilized for classification purposes.

The categorization of different types of honey is crucial to protect consumers from deceptive practices [29]. The existing literature has explored various methodologies for classifying different kinds of honey, with a predominant focus on categorizing them according to botanical and regional origins. The primary objective is to establish a bioanalytical method that exhibits high accuracy and specificity in discerning between various honey types. Mass spectrometry stands out as a highly suitable approach owing to its rapidity, accuracy, and precision. An illustrative instance is the utilization of gas chromatography-mass spectrometry (GC–MS) analysis to assess volatile organic compounds in honey, serving as an effective means for classification [30]. Applying this approach alongside Kohonen’s self-organizing map has enabled the discrimination between Turkish pine honey and Greek pine honey [31].

Conclusion

In conclusion, a bioanalytical technique integrating MALDI-MS-based N-glycomics and machine learning was created for the classification of Turkish pine honey in comparison to multi-floral honey. This method demonstrated high accuracy and specificity in distinguishing pine honey from multi-floral honey. Utilizing statistically significant glycans for discrimination resulted in a 100% accuracy rate. This approach enables swift differentiation between Turkish pine honey and multi-floral honey. The application of this approach could extend to distinguishing between pine and multi-floral honey originating from diverse botanical sources. Additionally, this method has the potential to differentiate between various types of multi-floral honey and mono-flower honey. This suggests that MALDI-MS-based glycomics coupled with machine learning hold promise as a reliable candidate for identifying honey-type adulteration and mitigating variations.