Introduction

The rapid and accurate identification of various bacterial strains is of paramount importance for effective disease control, treatment, and prevention, and for ensuring the safety and quality of food products and the environment [1]. While traditional bacterial recognition methods, including germiculture [2], microscopic observation of bacterial samples [3], flow cytometry [4], and enzyme-linked immunosorbent assays (ELISA) [5], have been established as reliable, these methods are encumbered by limitations such as prolonged culture periods, the need for advanced analytical equipment, the demand for specialized technical expertise, and location-specific restrictions, which impede their broad application.

Fluorescence analysis technology, known for its high sensitivity, specificity, and accuracy, is employed extensively in sensor development [6]. Current research in bacterial detection via fluorescence sensors focuses primarily on the unique interactions between the sensor and bacteria, leading to fluorescence quenching [7]. The recognition elements include antimicrobial peptides [8, 9], antibodies [1, the addition amounts of Triton X-100 and ethylene glycol were set at 5% and 10%, respectively.

Table 1 Parameters of three fluorescence inks

Identification of bacteria by sensor-array platform

To evaluate the capability of the fluorescence sensing array in identifying multiple bacterial strains, a diverse set of five bacteria was selected based on variations in size and morphology. These bacteria included Pseudomonas aeruginosa, Escherichia coli, Staphylococcus aureus, Salmonella typhimurium, and Listeria monocytogenes. Each bacterial strain, with a concentration ranging from 1.0 × 103 CFU/mL to 1.0 × 107 CFU/mL, was exposed to interaction with each of the three sensing units and subsequently assessed using the developed platform.

Figure 3A illustrates the range of relative signal intensity variations (ΔRGB) observed in the three types of CQDs following their interaction with the five types of bacteria at a concentration of 1.0 × 103 CFU/mL. Notably, the values of ΔRGB corresponding to the interactions between the different bacterial strains and the CQDs are distinctly unique. This can be attributed to the diverse binding abilities of the bacteria to the CQDs, which are determined by variations in size, morphology, and surface chemical components of the bacteria. Subsequently, the ΔRGB values were subjected to analysis using pattern recognition methods, including various machine learning algorithms.

Fig. 3
figure 3

(A) Relative signal intensity variety (ΔRGB) of the three types of CQDs reacted with five types of bacteria in a concentration of 1.0 × 103 CFU/mL. All values are the means of five replicates. (B) Identification efficiency of unknown individual samples. (C) Parallel coordinates figures of the three canonical factors (Factor 1, Factor 2, Factor 3) of unknown individual samples. (D) Confusion matrix plot of the results of the three classification algorithms for identification of individual unknown samples

During the training phase, we employed a random selection procedure to allocate 70% of the available data as the training set, while the remaining 30% was designated as the testing set. For each dataset, one or more ML algorithms were considered, depending on which algorithm demonstrated the best performance. We utilized all the available classification algorithms built into MATLAB for bacterial discrimination, without assuming that one algorithm was superior to the others. These algorithms included k-nearest neighbors (KNN), the naive Bayes method, decision trees, linear discriminant analysis (LDA), and support vector machines (SVM).

The test dataset consisted of a total of 122 samples, encompassing 50 individual bacterial samples, 24 binary mixtures, and 48 ternary mixtures derived from five distinct bacterial strains. This dataset was trained with several algorithms. Across the three subsets, the performance of the algorithms remained fairly consistent, with minor variations. The accuracy percentages of all these algorithms are illustrated in Figs. 3B and S10. Among these algorithms, three achieved a remarkable 100% accuracy rate on the entire dataset: (i) decision tree, (ii) linear discriminant analysis, (iii) naïve Bayes.

Three canonical factors, designated as Factor 1, Factor 2, and Factor 3, were generated for comprehensive discrimination. To gain a deeper understanding of the relationships among these three canonical factors derived from the ML algorithms, we employed a parallel coordinates figure (Figs. 3C and S11). A parallel coordinates figure offers a means of visualizing multivariate data and is particularly suitable for displaying trends in multivariate data. Additionally, the confusion matrix plot, displayed in Figs. 3D and S12, demonstrates that the three above-listed ML algorithms achieved 100% accuracy.

To further validate the discriminatory capabilities of the sensor array assisted by ML algorithms, we created a 3D feature space plot using the three canonical factors generated from a subset of individual bacterial samples (Fig. 4A). In this plot, each point represents the relative signal intensity variation of the three CQDs when reacted with a specific bacterial strain. The five bacterial strains were distinctly clustered, with a 100% identification accuracy rate for each type. We also selected the first two most significant discrimination factors to confirm the discrimination of the samples (Fig. 4B). Individual bacterial samples were identified without any failures, underscoring the high precision of the ML algorithms-assisted sensor array in bacterial discrimination. Furthermore, binary and ternary mixtures were accurately identified (Fig. S13). We calculated the Euclidean distances among the five types of bacteria to classify them.

Fig. 4
figure 4

(A) Feature space plot of five types of bacteria using machine learning algorithms. (B) Canonical score plot for the identification of individual bacterial samples. The bacteria concentration was 1.0 × 103 CFU/mL. (C) Canonical score plot for the distinction of E. coli at different concentrations. (D) Score plot of Factor 1 versus the concentration of E. coli. Error bars represent the standard deviation from three parallel tests. (E) Canonical score plot for the identification of unknown individual bacterial samples. (F) Canonical score plot for the identification of individual unknown bacterial samples in tap water. In each case, the bacteria concentration was 1.0 × 103 CFU/mL

To determine the limit of detection (LOD) of the sensor array, we conducted a quantitative analysis of five types of bacteria. The fluorescence response of the sensor array to varying concentrations of bacteria was transformed into canonical score plots. As shown in Fig. 4C, the sensor array effectively distinguishes between various concentrations of E. coli, correlating Factor 1 with bacterial concentration. Figure 4D demonstrates a strong linear relationship between the fluorescence intensity of the sensor array and the concentration of E. coli. The sensor array demonstrated consistent performance in distinguishing between the various concentrations of the remaining four bacteria (Fig. S14). Table S1 summarizes the LOD of the sensor array for the five types of bacteria. These results demonstrate that the sensor array can be utilized for simultaneous identification and quantification of bacteria.

Identification of blind and real samples

The practicality of the platform was assessed by employing individual bacteria, binary mixtures, and ternary mixtures of the five bacterial strains, all with a total concentration of 1.0 × 103 CFU/mL, as blind samples. As depicted in Figs. 4E and S15, a total of 45 blind samples were accurately identified (the results are listed in Table S2). The device's applicability to real samples was further explored using 12 unknown bacterial samples mixed in tap water, with each sample having a concentration of 1.0 × 103 CFU/mL. All of the bacterial samples were successfully distinguished within the tap water medium (Fig. 4F, the results are listed in Table S3). These results confirm the platform's suitability for identifying the five types of bacteria in real samples.

Conclusion

In this study, we developed a paper-based fluorescence sensor-array platform designed for the identification of various bacterial strains. Three antibiotic-modified CQDs, each with distinct bacterial binding affinities, were utilized as the sensing units. These CQDs were formulated into fluorescent inks with carefully adjusted surface tension and viscosity. The paper-based sensor array was then created by printing an array pattern directly onto filter paper using an inkjet printer.

The differential binding abilities of bacteria to the CQDs resulted in differences in the fluorescence quenching efficiency of the CQDs. Leveraging this principle, we established unique identification fingerprints for each bacterial strain on the proposed sensor-array platform. This platform enables rapid on-site discrimination of bacteria by utilizing smartphones and integrating multiple machine learning algorithms.

Furthermore, we validated the platform's applicability by successfully differentiating between bacterial strains in real samples and accurately identifying blind samples. This platform shows great promise as a tool for on-site testing, with significant potential applications in various fields such as food safety assessment, disease diagnosis, and environmental pollution detection.