Background

Chrysanthemums (Dendranthema grandiflorum) belong to the Asteraceae family and are used widely as an ornamental plant globally. Dendranthema species have great economic value for the therapeutic, cosmetic, pharmaceutical, and food industries [1], which is attributed to their abundant nutritional and bioactive compounds, encompassing essential oils and polyphenols, which have antioxidant [2, 3], anti-microbial [4], anti-inflammatory, and antibacterial [5] properties.

The Korean domestic productivity of standard D. grandiflorum decreased from 116.17 million stems in 2019 to 97.33 million stems in 2020 and further down to 95.56 million stems in 2021 [6,7,8]. In contrast, the import volumes of D. grandiflorum have surpassed Korea’s domestic productivity, with 146.84 and 173.53 million stems imported in 2021 and 2022, respectively [9]. Owing to the expanding international flower trade and increasing consumer demand for premium-quality flowers, the importance of tracing the geographic origin of flowers has emerged. In Korea, domestically produced D. grandiflorum is priced higher than imports, and the origin is a substantial price-determining factor. To ensure fair competition and guarantee consumer rights, an accurate analytical platform to discriminate between domestically produced and imported D. grandiflorum is required.

In recent years, numerous studies have employed various methodologies to ascertain the origin of Chrysanthemum flowers. Ultra-high-performance liquid chromatography–quadrupole time-of-flight mass spectrometry [10, 11], Fourier transform infrared [12], laser-induced breakdown spectroscopy [13], gas chromatography–mass spectrometry (GC–MS) have been applied to successfully discriminate the geographical origins of Chrysanthemum flowers without complete identification or quantification of markers. Among these methods, GC–MS is a common analytical technique used to detect volatile components, and its advantages over other methods include its high quality and reproducible results, its wide dynamic range, and the fact that there is a universal mass spectral library [14].

VOCs are a class of secondary metabolites that play important roles in protecting plants from pests and diseases, attracting pollinators, and increasing competitive advantages [15,16,17]. VOCs are reportedly closely associated with plant genotypes, environmental conditions, geographic variations, physiological factors, socio-political conditions, and harvest time [18]. VOCs emitted by different plant organs increase in response to abiotic stressors such as temperature, intense light, water, salt, and oxidative stress, as plants strive to swiftly recover, thereby enhancing their adaptability [19,20,21].

VOCs primarily consist of terpenoids, terpenoid derivatives, and fatty acid derivatives. These compounds are under the regulation of transcription factors (TFs) and epigenetic mechanisms [22], and extended exposure to various growth conditions can lead to alterations in metabolite profiles [12]. Therefore, profiling the volatile compounds in Chrysanthemum would provide origin authentication and valuable insights into the metabolic differences between various Chrysanthemum cultivars [23] and their production areas [24], and it would also highlight the underlying biosynthetic pathways associated with these compounds [2, 25].

To the best of our knowledge, VOCs profiling of Chrysanthemum has primarily focused on tea cultivars [23], despite the widespread use of **ba, Iwa-no-hakusen, and Baekgang cultivars for ornamental purposes in East Asian countries, such as China, Korea, and Japan. In addition, the volatile fingerprints or the related transcriptome of these cultivars have not been reported. For example, Mekapogu et al. [26] differentiated between **ba, Iwa-no-hakusen, and Baekgang using SSR markers and confirmed their unique genetic backgrounds. However, this approach has limitations when it comes to distinguishing the origin of the same variety cultivated in different regions. This is particularly relevant for Chrysanthemum varieties such as **ba and Iwa-no-hakusen, which are cultivated in multiple countries, including Korea, Vietnam, and China. Therefore, it is necessary to develop a simple and reliable strategy to determine the VOCs contents of Chrysanthemum flowers in consideration of both their genotype and geographic origin.

Previous studies have employed various extraction methods, including organic extraction, hydrodistillation extraction [27, 28], and headspace (HS)–solid-phase microextraction (SPME) coupled with GC–MS [25, 29] to determine the VOCs contents. However, these analytical methods involve complex pretreatment processes, can be time-consuming, and require additional experiments to establish optimal extraction conditions. HS extraction is a simple and rapid aroma extraction method that does not require sample pretreatment [30]. This technique can capture and analyze volatile chemicals from the vapor emitted into the air, owing to the given temperature and pressure conditions applied to the samples. HS extraction can provide a more comprehensive overview of the volatile fraction when compared with SPME injection [31]. A key obstacle with GC–MS technology is how to extract valuable information from a large amount of complex raw data. GC–MS fingerprint data have been effectively analyzed using multivariate analysis techniques, including unsupervised chemometric algorithms [such as principal component analysis (PCA) and hierarchical cluster analysis (HCA)] as well as supervised chemometric analyses [such as orthogonal partial least squares discriminant analysis (OPLS-DA), random forests, and k-nearest neighbor for discrimination of geographical origin] [32,33,34,35].

The present pilot study first developed a method for determining Korean domestically produced and imported D. grandiflorum, based on HS–GC–MS combined with chemometrics. We performed intra-day and inter-day tests to assess the precision of the developed HS–GC–MS method. The relationships among the 41 metabolites identified in the Korean and imported D. grandiflorum were also evaluated using Pearson’s correlation analysis and HCA. Subsequently, we determined the volatile compound traits influenced by multiple environmental factors and identified major D. grandiflorum cultivars (Baekgang, **ba, Iwa-no-hakusen) grown in different countries (Korea, China, Vietnam), via PCA, OPLS-DA, volcano plots, and clustering analyses. An external validation based on different harvest years was applied to assess the reliability of the developed geographical model.

The findings of this study may be used to effectively determine the geographical origin of D. grandiflorum and enable a comprehensive understanding of the volatile oils of its cultivars (**ba, Iwa-no-hakusen, and Baekgang) of different countries.

Methods

Chemicals and reagents

The standards of the following compounds were purchased from Sigma-Aldrich (St. Louis, MO, USA) as reference compounds, with a purity of ≥ 95%: cyclohexanol (99%), hexanal (98%), propanoic acid (99.5%), tiglic acid (99%), 6-methyl-5-hepten-2-one (99%), camphene (95%), eucalyptol (99%), camphor (96%), and pyrazine 2,5-dimethyl (98%). Analytical standard grade acetoin, isovaleric acid, (–)-bornyl acetate, terpinene-4-ol, α-terpineol, and β-elemene were obtained from Supelco Inc. (Sigma-Aldrich). Moreover, β-pinene, α-terpinene, γ-terpinene, p-cymene, linalool, β-caryophyllene, and caryophyllene oxide were used as certified reference materials. Sabinene was used as the phytoproof® reference substance (Phytolab GmbH & Co, Vestenbergsgreuth, Germany).

D. grandiflorum samples

Ten Korean, four Chinese, and six Vietnamese D. grandiflorum were collected from the Yangjae flower market center in Seoul, South Korea, and used in this study (Additional file 1: Table S1). All samples were collected between April and July 2022. The 10 Korean samples were Iwa-no-hakusen (n = 9) and Bakgang (n = 1), which were purchased from a wholesale dealer immediately after the flower auction. All four Chinese samples were of the Iwa-no-hakusen cultivar (n = 4), whilst among the six Vietnamese samples, four were Iwa-no-hakusen (n = 4) and two were **ba (n = 2). The 10 Korean samples were obtained from D. grandiflorum representative production areas. The retail businesses purchased the non-Korean samples from imported wholesale suppliers. Both the Korean domestic and non-Korean D. grandiflorum were purchased in units of boxes, with the box clearly indicating the place of origin and the variety. Each box contained 15 to 20 bunches of D. grandiflorum. N = 1 means that for each box, there was a single specimen where “specimen” refers to the data obtained from conducting experiments by sampling from the same box three times for use in analysis. The samples collected in 2023 were obtained using the same method applied in 2022, and the detailed information is presented in Additional file 1: Table S1.

Twenty flower heads were used as one sample. The samples were frozen at − 40 °C for 8 h and lyophilized for 72 h. Samples were ground using a grinding mill (IKA, Königswinter, Germany) to obtain homogeneous powders. The powders were stored in polyethylene sample bags at − 40 °C until analysis.

HS conditions

Each sample (0.5 g) was placed in a 20-mL headspace glass vial, and 5 µL of (–) endoborneol (2000 µg /mL in methanol) as an internal standard was added for each sample for quantitative analysis. The vials were sealed with a polytetrafluoroethylene-faced silicone septum and placed in a static headspace autosampler (EST Analytical, Township, OH, USA). The incubation temperature and time have been optimized. The related information is provided in Additional file 1: Figure S1. The optimized HS sampling conditions were as follows: incubation temperature of 120 °C, incubation time of 50 min, agitate type was original, agitate speed of 337 rpm, agitate duration time of 50 min, quantitative syringe volume of 2.5 mL, and needle sweep time of 5.0 min.

GC–MS conditions

The syringe containing the HS sample with a 2.5 mL injection volume was inserted into the GC injection port with a split ratio of 30:1 at 250 °C. A Bruker Scion 436 GC chromatograph coupled with a Scion TQ (Bruker, Bremen, Germany) mass spectrometer was applied for the extraction and analysis of the volatile compounds. The transfer line, ion source, and detector temperatures were 250 °C, 230 °C, and 300 °C, respectively.

The instrument was equipped with a DB-wax column (30 m × 0.32 mm × 0.15 μm) and helium was used as the carrier gas at a constant flow rate of 1.0 mL/min. The oven temperature program was as follows: the initial temperature was set at 45 °C, held for 4 min, subsequently increased to 55 °C at 2 °C/min, increased to 70 °C at 1 °C/min, later increased to 110 °C at 2 °C/min, finally increased to 240 °C at 20 °C/min, and held for 2 min.

The energy of the electron beam was set to 70 eV with the quadrupole. Mass spectra were acquired in full scan mode with ion ranges from 40 to 400 m/z. The system was controlled, and the GC–MS data were acquired and processed using Bruker MSWS8 software (Billerica, MA, USA). Volatile compounds were identified via mass spectrometric and a retention index comparison using the NIST Library (NIST14). Furthermore, identification was based on retention indices relative to n-alkanes (C7–C30) that were compared with literature values or obtained from reference standards.

Volatile components were quantified by determining the ratio of the peak area of the volatile components relative to the peak area of the internal standard compound in the total ion chromatogram obtained via GC–MS [36].

Multivariate analysis

Data were normalized based on the median fold change |FC| (the adjusted median of peak intensity fold changes between samples in a set of experiments was approximately zero), followed by Pareto scaling (due to the relative importance of large-scale values and kee** the data intact) for baselining. PCA, fold change analysis, HCA with Pearson’s correlation, clustering analysis (distance metrics, squared Euclidean) were implemented using the MPP software (15.0, Agilent Technologies, Santa Clara, CA, USA). OPLS-DA and VIP analyses were conducted using the SIMCA 17 software (Umetrics AB, Umea, Sweden). ANOVA testing of cross-validated predictive residuals (CV-ANOVA) and random permutation tests were performed to evaluate the reliability of the OPLS-DA model.

Evaluation of classification performance

To determine the performance of the established models, we evaluated the sensitivity, specificity, accuracy, and AUC. Sensitivity indicates whether a model correctly identifies positive cases out of all actual positive cases (TP/(TP + FN)) negative cases (TN/(TN + FP)); and accuracy indicates the ratio of correctly predicted samples to the total number of samples evaluated ((TP + TN)/(TP + FN + TN + FP)). TP is true positive, TN is true negative, FN means false negative, and FP means false positive [37]. An ROC curve is a two-dimensional depiction of classifier performance and AUC is a portion of the area of the unit square [38]. The AUC is calculated by plotting the true positive rate (TPR) against the false positive rate (FPR). The value of AUC will always be between 0 and 1.0, with a higher value indicating better performance.

Results and discussion

Identification of D. grandiflorum metabolites via GC–MS

A total of 41 compounds were identified from 30 D. grandiflorum specimens cultivated in Korea and 30 imported D. grandiflorum specimens obtained from 2022 year, and the mean values (mean), and standard deviation (SD) are listed in Table 1.

Table 1 Concentrations, mean values, and standard deviations of 41 volatile compounds determined in Korean and imported D. grandiflorum

These 41 compounds were identified by matching their mass spectral data to NIST14 with a similarity score > 800 in accordance with previous reports [35, 39, 40]. As a general principle, values of 800 or greater indicate a good match [41]. In a further step, 23 peaks were identified by comparing the mass spectra and retention times with those of the commercial standards (Additional file 1: Figure S2). As shown in Additional file 1: Table S2, other compounds, for which we did not have reference standards, were tentatively identified based on the match of the compound’s RI with that in the NIST Chemistry WebBook database (http://webbook.nist.gov/chemistry/) and literature [42]. The 41 metabolites were divided into 6 classes; the topmost portion included 22 monoterpenoids (53.7%), 9 sesquiterpenoids (22.0%), 3 fatty acids (7.3%), 3 others (7.3%), 3 alcohols (7.3%), and 1 ketone (2.4%; Fig. 1a). Almost all volatile components have been previously reported in Chrysanthemum flowers [24, 43,44,45].

Fig. 1
figure 1

Chemical classification of 41 metabolites identified in D. grandiflorum. a Frequency rate of volatile components in Korean and non-Korean samples. Percentage of separate categories based on concentrations released from b Korean and c non-Korean groups

Cyclohexanol has been reported in mangoes [46], and butanoic acid, 4-hydroxy has been reported in apples [47]. However, the present study is the first to report their presence in the Chrysanthemum genus. Moreover, hydroxyacetone, 3,6-octadienoic acid, 3,7-dimethyl-, methyl ester, and (Z)-were reported for the first time in D. grandiflorum.

Figure 1b, c shows the comparative analysis of chemical constituents between Korean and imported D. grandiflorum samples, and the constituents include monoterpenoids (80.0% and 72.1%, respectively), sesquiterpenoids (10.5% and 17.0%, respectively), fatty acids (2.4% and 2.8%, respectively), alcohols (5.0% and 5.7%, respectively), others (1.0% and 1.4%, respectively), and ketones (1.0% and 1.0%, respectively).

For Korean D. grandiflorum, the order of the top 5 in terms of their average contents was as follows: trans-chrysanthenyl acetate > eucalyptol > camphor > β-elemene > cis-chrysanthenyl acetate. For imported D. grandiflorum, the order of the top five in terms of their average contents was as follows: trans-chrysanthenyl acetate > eucalyptol > β-elemene > cis-chrysanthenyl acetate > camphor. Notably, cyclohexanol, (–)-bornyl acetate, camphene, and linalool were present only in Korean D. grandiflorum (Table 1).

Chang and Kim [48] reported that the most abundant volatile compounds in D. grandiflorum grown in Korea were chrysanthenyl acetate (43.74%) followed by 2-pinen-4-ol (27.85%), eucalyptol (5.28%), and 4-terpineol (2.57%). Yang et al. [3] analyzed 36 flavor components in white Chrysanthemum cultivated in China and identified camphor (9.53 μg) as the most common component. Zhan et al. [49] reported that the chemical composition of a Chrysanthemum cultivar, fubaiju, which originates from the Hubei province (China), mainly contains camphor (16.87%), eucalyptol (10.62%), β-curcumene (9.51%), (–)-bornyl acetate (7.43%), α-zingiberene (6.97%), and β-elemene (6.47%). In contrast, in this study, trans-chrysanthenyl acetate was present in the highest proportion in both domestically produced and imported samples, whereas camphor was the most common component in the essential oil from Chinese Chrysanthemum. Compared to previous studies [3, 48, 49], although D. grandiflorum flowers showed a similar qualitative composition, the relative proportions of their constituents were relatively different to those of the present study.

When glycerol undergoes dehydration, hydroxyacetone is generated [50]. Hu et al. [51] reported hydroxyacetone is mainly derived from glucose pyrolysis. Hydroxyacetone is not typically detected directly in plants, but it is presumed to be a major intermediate, primarily produced from glycerol or glucose. Acetoin and 2,5-dimethyl pyrazine are produced through the Maillard reaction, a process where sugars and amino acids undergo denaturation due to heat, resulting in the development of a pleasing taste and flavor.

In this study, hydroxyacetone, acetoin, and 2,5-dimethyl pyrazine were predicted to have appeared as an intermediate in D. grandiflorum flowers during the 50-min headspace sample at 120 °C.

Repeatability of HS–GC–MS

For both inter- and intra-day tests, it is important to use appropriate statistical analyses to evaluate the consistency and accuracy of the results obtained. The intra-day and inter-day repeatability were determined by analyzing three replicates of the 41 compounds on the same day (intra) and three consecutive days (inter), measured by calculating the relative standard deviation (RSDs) of the contents of all analytes. The RSDs of the repeatability of intra-day and inter-day analyses ranged from 1.0 to 10.12% and from 8.66 to 20.89%, respectively (Additional file 1: Table S3). Therefore, the HS–GC–MS method developed to identify the volatile compounds in D. grandiflorum flowers showed good repeatability. Shen et al. [44] analyzed the intra-day repeatability of four constituents from Chrysanthemum indicum volatiles via SPME–GC–MS and reported values ranging from 6.84 to 10.04% for eucalyptol, from 3.32 to 9.61% for camphor, from 3.55 to 8.93% for borneol, and from 4.28% to 9.07% for bornyl acetate, which is consistent with the results of the present study obtained using HS–GC–MS. However, HS–GC–MS is a simpler method than the SPME–GC–MS technique, and it requires fewer parameters to be optimized as it does not involve the use of SPME fiber.

PCA and OPLS-DA

Discrimination of the geographical origin of D. grandiflorum

PCA was initially conducted to evaluate the degree of clustering and variance among samples associated with the D. grandiflorum origin and cultivar. As shown in Fig. 2, the PCA 3-dimensional scatter plot discriminated Korean and non-Korean samples across the principal component 1; PC1, PC2, and PC3, which explained 35.62%, 19.02%, and 8.18% of the total variation, respectively. The Vietnam **ba and Korean Baekang clusters were observed along the PC2. The Korean and non-Korean Iwa-no-hakusen (both from China and Vietnam) overlapped along the PC1, followed by the PC3. Notably, although the PCA grouped the D. grandiflorum cultivars, it could not provide a clear comparison between the Korean and non-Korean groups.

Fig. 2
figure 2

Three-dimensional scatter plot of Korean and non-Korean Dendranthema grandiflorum with three cultivars (Baekgang, Iwa-no-hakusen, **ba). Profiles based on principal component analysis

OPLS-DA is a supervised classification method. OPLS-DA is better equipped than PCA (unsupervised method) to deal with within-class variation as it identifies and models the variation specifically related to the response variables, thus enabling the discovery of highly discriminative markers [52]. In this study, 41 compounds were assigned as the X-variables, and Korean, and non-Korean categories were assigned as the Y-variables. To prevent over-fitting the OPLS-DA model, we conducted a sevenfold internal cross-validation, and successful validation was determined by obtaining a significant result (P < 0.05) in the CV-analysis of variance (ANOVA) for the model. Additionally, a permutation test with 200 permutations was performed. As shown in Fig. 3a, the two groups showed a clear separation in the OPLS-DA score plot. The model’s fitness and predictive ability were then assessed by examining the R2Y and Q2 values, respectively. The R2Y and Q2 parameters closer to 1 indicate greater model stability and reliability, admitting a Q2 threshold > 0.5 as good predictability [35]. The OPLS-DA model explained a cumulative 74.7% (R2X) of the total variance and it consists of one predictive component (10.5%) and five orthogonal components (64.2%). The resulting R2Y, and Q2 values were 0.91 and 0.849, respectively, and a CV-ANOVA P-value < 0.05 indicated good fitness and a good predictive ability of OPLS-DA model. The permutation test yielded an R2Y-intercept value of 0.357 and a Q2Y-intercept value of − 0.692, indicating that the model was statistically valid and did not overfit the data (Fig. 3b).

Fig. 3
figure 3

Orthogonal partial least squares discriminant analysis (OPLS-DA) model results classified Korean and non-Korean D. grandiflorum profiles. a OPLS-DA score plot: R2X (0.747), R2Y (0.91), and Q2 (0.849). b Permutation test plot: R2Y-intercept (0.357), Q2-intercept (-0.692). c Variable importance projection (VIP) plot; red represents VIP > 1.0 indicating the most influential metabolites in the OPLS-DA model. d ROC using 41 metabolites of OPLS-DA

The OPLS-DA models built to discriminate between Korean and non-Korean CM showed a high discrimination potential and predictive ability. Even within the same Iwa-no-hakusen cultivar, distinct differences were detected between the Korean and non-Korean samples based on the OPLS-DA results. Moreover, receiver operating characteristic curves were constructed based on the 41 metabolites to indicate the overall performance of the OPLS-DA model. To select the characteristic components, Variable importance projection (VIP) values were used, and 15 components showed a VIP > 1(Fig. 3c). VIP scores greater than 1.0 are used as a threshold criterion for variable selection, as the respective variables are considered to be the most influential in the model. The higher the VIP scores, the more important a variable is.

Receiver operating characteristic (ROC) analysis was conducted to evaluate the performance of OPLS-DA model in classification based on 41 markers. As shown in Fig. 3d, the area under the ROC curve (AUC) values of 1 were obtained for both Korean and imported D. grandiflorum, indicating that the model perfectly distinguished the two classes.

Discrimination of D. grandiflorum cultivars

OPLS-DA was also performed to identify the influence of D. grandiflorum cultivar on the differences in metabolites. The 41 compounds were designated as the X-variables, and the **ba, Baekgang, and Iwa-no-hakusen groups from 2022 year were designated as the Y-variables. As shown in Additional file 1: Figure S3a, OPLS-DA models demonstrated good predictive values (R2Y = 0.703, Q2 = 0.659). The genotypes were clearly divided into **ba, Baekgang, Iwa-no-hakusen. However, **ba and Baekgang samples fell outside Hotelling’s T2 95% confidence interval. This result is probably attributed to the lower number of specimens for **ba and Baekgang compared to Iwa-no-hakusen. A total of 18 compounds presented VIP values higher than 1, with filifolone, tiglic acid, and trans-chrysanthenyl acetate being the highest-ranking compounds (Additional file 1: Figure S3c). The comparison of the components with VIP > 1 in origin determination (Fig. 3c) to those in variety determination revealed that four components (tiglic acid, 3,6-octadienoic acid, 3,7-dimethyl-, methyl, camphor, and eucalyptol) exhibited VIP > 1 in both origin and variety determinations, indicating their high influence in both aspects. The remaining 14 components showed a high influence only in variety determination. The influence of 7 components (β-pinene, α-terpineol, terpinen-4-ol, sabinene, γ-terpinene, β-phellandrene, and α-terpinene) was lower, as indicated by VIP scores ranging from 0.68 to 0.88 in origin determination. However, in variety determination, the VIP scores were greater than 1, indicating a higher level of influence on genotype. These 7 components are closely related to the monoterpene synthesis pathway through the alpha-terpinyl cation. They either directly originate from the alpha-terpinyl cation (β-phellandrene, and α-terpineol), are formed from the evolved terpinene-4-yl cation (α-terpinene, γ-terpinene) through isomerization, or emerge from the intermediate pinyl cation (β-pinene) derived from the alpha-terpinyl cation. Alpha-terpineol undergoes cyclization to produce eucalyptol [53].

Pearson’s correlation analysis and hierarchical cluster analyses

To obtain insights into the correlation among the 41 metabolites identified in Korean and non-Korean D. grandiflorum, a Pearson’s correlation analysis was conducted, and an HCA was performed to assemble and graphically represent the correlations (Fig. 4).

Fig. 4
figure 4

Correlation matrix of 41 volatile compounds from Korean and imported D. grandiflorum, including three cultivars. Each square indicates the Pearson’s correlation coefficient of a pair of compounds, and the correlation coefficient value is represented by the intensity of blue or red colors, as indicated on the color scale

In the correlation analysis, a positive correlation indicates that the variables increase together, whereas a negative correlation indicates that as one variable increases, the other decreases. High correlations among metabolites suggest that they are produced through closely or indirectly related pathways.

The analysis revealed that all three groups showed strong intercorrelations (r > 0.8, P < 0.001); the first group included almost all sesquiterpenoids, the second included mostly monocyclic monoterpenoids, and the third included mainly bicyclic monoterpenoids.

The markers belonging to the sesquiterpenoids family, helminthogermacrene, β-bisabolene, and α-selinene, were highly positively correlated (r > 0.8 and P < 0.001). α-Terpinene, γ-terpinene, terpinenen-4-ol, α-terpineol, and β-phellandrene are involved in the conversion of terpenes into monocyclic monoterpenes [53]. Our results showed that these metabolites have a strong positive correlation with each other; α-terpinene was highly correlated with γ-terpinene (r = 0.9848, P < 0.001), terpinenen-4-ol (r = 0.9474, P < 0.001), α-terpineol (r = 0.8232, P < 0.001), and β-phellandrene (r = 0.9327, P < 0.001). Moreover, γ-terpinene was strongly correlated with terpinenen-4-ol, α-terpineol, and β-phellandrene (r > 0.8, P < 0.001), and α-terpineol showed a strong correlation with terpinenen-4-ol and β-phellandrene (r > 0.8, P < 0.001). This observation is consistent with a previous study that reported α-terpinene showed a strong correlation with α-terpineol, γ-terpinene, terpinenen-4-ol, and β-phellandrene in chemotype 2 of Melaleuca alternifolia [54].

Sabinenes are bicyclic monoterpenoids derived from sabinyl cations. However, in our study, they were strongly correlated with other terpene compounds, such as α-terpinene, γ-terpinene, terpinenen-4-ol, α-terpineol, and β-phellandrene. This result is likely attributable to the fact that the sabinyl cation is derived from an α-terpinyl cation, which is a common precursor for various terpene compounds, leading to a high degree of relatedness among them. Moreover, three metabolites, namely camphor, eucalyptol, and trans-chrysanthenyl acetate, showed a strong correlation with each other (r = 0.8, P < 0.001).

Volcano plots and clustering analyses

Clustering and volcano plot analyses were performed to identify differential metabolites that play an important role in distinguishing between Korean and non-Korean D. grandiflorum. Moreover, a clustering analysis was performed to create a visual representation of metabolite information using color coding. The 41 components from 60 D. grandiflorum specimens were integrated into the Mass Profiler Professional (MPP) program as variables, and hierarchical clustering, squared Euclidean distance, and ward were utilized as measurements for between-group linkage.

A clear distinction was observed in the clustering analysis results between the Korean and non-Korean groups. The Korean cluster exhibited two strong red-colored compounds (pinocarvone and isopinocarevol), three red-colored compounds (eucalyptol, hexanal, and 3,6-octadienoic acid, 3,7-dimethyl-methyl ester), and five orange-colored compounds (α-terpineol, linalool, camphor, camphene, and cyclohexanol) as distinctive aroma substances. The non-Korean cluster demonstrated three prominent red-colored compounds (tiglic acid, β-caryophyllene, and α-salinene), four red-colored compounds (propanoic acid, cis-Z-α-bisabolene epoxide, butanoic acid, 4-hydroxy-, and aromandendrene), and one orange-colored compound (helminthogermacrene) as distinctive aroma substances. Among the compounds that showed red or orange colors, all except three metabolites (linalool with VIP = 0.98, α-terpineol with VIP = 0.88, and cyclohexanol with VIP = 0.926) exhibited VIP > 1 in the OPLS-DA model.

A volcano plot is an effective tool for visualizing the differential abundance between two conditions by providing the relationship between P-values and fold change values. Fold change was used to evaluate the absolute ratio (non-Korean vs. Korean) of 41 volatile compounds with a P-value < 0.05 and fold change cut-off of 1.2. P-values were obtained using the unpaired t-test and Benjamini–Hochberg’s false discovery rate (FDR) for multiple testing corrections. Fifteen metabolites were identified as important contributors (the list of differential features is shown in Additional file 1: Table S4). Among which, 8 metabolites, including hexanal (P = 0.006, FC = − 1.51, VIP = 1.15), eucalyptol (P < 0.0001, FC = − 1.67, VIP = 1.34), camphor (P = 0.013, FC = − 1.4, VIP = 1.07), pinocarvone (P < 0.0001, FC = − 2.13, VIP = 1.68), isopinocarveol (P < 0.0001, FC = − 2.15, VIP = 1.76), linalool (P < 0.0001, FC = − 1.37, VIP = 0.98), camphene (P = 0.004. FC = − 1.42, VIP = 1.01), and cyclohexanol (P = 0.004, FC = − 1.34, VIP = 0.93), were significantly predominant in Korean D. grandiflorum. Excluding hexanal and cyclohexanol, the remaining six markers belonged to the monoterpenoid family. The relative levels of seven metabolites, including propanoic acid (P < 0.0001, FC = 1.48, VIP = 1.05), β-caryophyllene (P < 0.0001, FC = 2.09, VIP = 1.59), α-selinene (P < 0.0001, FC = 1.98, VIP = 1.62), tiglic acid (P < 0.0001, FC = 1.82, VIP = 1.39), butanoic acid, 4-hydroxy- (P = 0.002, FC = 1.54, VIP = 1.21), cis-Z-α-bisabolene epoxide (P = 0.008, FC = 1.45, VIP = 1.03), and aromandendrene (P < 0.0001, FC = 1.59, VIP = 1.25), were significantly abundant in non-Korean D. grandiflorum. The volcano plot analysis results were almost consistent with those obtained by the clustering analysis.

Although 3,6-octadienoic acid,3,7-dimethyl-methyl ester had a VIP score higher than 1, it was not selected as a significant marker in the volcano plot because of a slightly higher P-value than 0.05. In contrast, cyclohexanol and linalool had a P-value lower than 0.05 and a fold change of 1.2 in the volcano plot, indicating significantly high levels in domestic samples. However, their VIP scores were slightly below 1, at 0.93 and 0.98, respectively. Ultimately, 13 markers that satisfied both the VIP score and volcano plot criteria were selected as the final origin-discriminating markers. Clustering and volcano plots showed that the Korean D. grandiflorum were best characterized by a high correlation with a range of monoterpenoids, whereas the non-Korean D. grandiflorum consisted mainly of sesquiterpenoids (Fig. 5a and b).

Fig. 5
figure 5

Characteristic metabolites of D. grandiflorum from Korean and non-Korean samples. a Clustering visualization (Euclidean distance, Ward algorithm) and b volcano plot (adjusted P-value < 0.05, foldchange > 1.2). Relative concentrations are indicated by colored boxes

The results of this study are similar to those of Yang et al. [11], who compared the volatile metabolites of Chrysanthemum produced in the southern and northern regions of China. Southern Chrysanthemum had a higher sesquiterpenoid content than that of northern Chrysanthemum, whereas northern Chrysanthemum had a higher monoterpenoid content than that of Southern Chrysanthemum. In this context, the eucalyptol content of Polish thyme was higher than that of Spanish and Moroccan thyme, and it grows at a comparatively higher latitude [35]. Moreover, Wen et al. [55] reported that the concentrations of 11 sesquiterpenoids, including β-caryophyllene, increased in the leaves of Chrysanthemum nankingense when exposed to heat stress above 45 °C, whereas no monoterpenoids were detected. The southwestern region of China and Vietnam, which are the major production areas of D. grandiflorum imported by Korea, are located at lower latitudes than those in South Korea, resulting in higher annual temperatures. Therefore, the temperature corresponding to the regional trait may be dominantly influenced by the variability of sesquiterpenoids in D. grandiflorum from Korean and imported D. grandiflorum.

Geographical origin model external validation

The classification model should be validated with an external set of samples that were not used for geographical model construction to facilitate the control of illegal activity. To that end, the 2023 samples were used to validate the OPLS-DA model (which was based on the 41 markers from 2022 year). Samples were collected from Korea and non-Korea (i.e., China, Vietnam) to assess the use of different samples during the 2 years of study.

The OPLS-DA model exhibited good predictive capability, with approximately 93.3% ability to correctly classify test samples (Table 2).

Table 2 Classification of the geographical origin of D. grandiflorum using OPLS-DA based on 41 metabolites from HS–GC–MS data

Although the Chinese Makoto cultivar was not included in the calibration set, external validation results correctly identified it as foreign. Makoto is a Chrysanthemum variety not commercially produced in Korea, and any Makoto found in the Korean market is imported from China or Vietnam. All the false negatives were attributed to Iwa-no-hakusen produced in Imsil, Jeolla-do. Imsil in Jeolla-do region was also not included in the calibration set (Additional file 1: Table S1).

The discriminative marker trends were similar across the two study years, which further reinforces the robustness of our study results. However, future studies should be conducted to construct a D. grandiflorum database related to Jeolla-do, Imsil-originating samples. Thereafter, the classification model should be updated to maintain the accuracy rate.

Conclusions

In this pilot study, we first obtained the volatile compound traits of multiple environmental factors from major D. grandiflorum cultivars (Baekgang, **ba, and Iwa-no-hakusen) and from those from different cultivation countries (Korea, China, and Vietnam) and conducted a multivariate statistical analysis. Static HS–GC–MS is a simple, rapid, and reliable strategy that does not require pretreatment, and it was used in this study to simultaneously identify volatile compounds in D. grandiflorum. A total of 41 volatile components, mainly monoterpenoids, sesquiterpenoids, alcohols, aldehydes, ketones, and fatty acids, were identified in D. grandiflorum. To determine the chemical differences between D. grandiflorum cultivated from different origins, PCA and OPLS-DA were then implemented based on the 41 identified components. PCA distinguished between three cultivars (**ba, Iwa-no-hakusen, and Baekang), whereas OPLS-DA-based origins effectively differentiated between Korean and imported D. grandiflorum samples by analyzing their quantitative concentration levels. The clustering and volcano plot analyses revealed that non-Korean D. grandiflorum samples contained relatively high levels of sesquiterpenoids, whereas monoterpenoids were abundant in Korean CM. When applied in the control of illegal activities using samples collected in different years, the original discrimination model demonstrated an accuracy of 93.3% based on OPLS-DA external validation.

The present study was performed using a limited sample size, and the findings do not reliably represent all three genotypes and their respective geographic origins. However, 41 makers confirmed robustness maintenance over two years. As metabolic patterns can vary depending on environmental factors, continuous construction of various geographical databases is required to maintain the classification accuracy for routine analysis. The discovered cultivars and relationships between the volatile substances detected in D. grandiflorum of different origins can serve as a foundation for biomarker selection and quality control in the food and cosmetic industries.