1 Introduction

Neuromorphic computing is an effective solution to break the bottleneck of von Neumann computing from the architectural level. As a novel paradigm of neuromorphic computing, reservoir computing (RC) [16, 22, 36] has a fundamental structure comprising of input layer, reservoir layer, and output layer, as illustrated in Fig. 1a. Among them, the input layer and reservoir layer adopt random, fixed connection weights, and only the weights of the output layer need to be trained. This effectively addresses issues like gradient vanishing or exploding, slow convergence, and high computational costs encountered in classical RNNs training. The reservoir layer can nonlinearly map low-dimensional input signals to a high-dimensional space, which provides a lightweight network scheme for efficient processing of sequential signals.

Fig. 1
figure 1

Schematic diagrams of classical RC and time-delay RC

According to the topology structure of the reservoir layer, RC can be divided into two categories [32]: classical RC and time-delayed RC. As shown in Fig. 1a, the reservoir layer of classical RC consists of numerous randomly connected nodes. When realizing the random connections among multiple physical nodes in hardware, the hardware implementation of the classic RC is difficult and the structure is complex. In 2011, L. Appeltant et al. [4] proposed the time-delayed RC, providing significant convenience for hardware implementation. Specifically, a single Mackey–Glass oscillator is used as the nonlinear node (see NL in Fig. 1b). After the input signal is multiplied by the mask matrix, according to the principle of time division multiplexing, the delay feedback loop with a length of \(\tau \) is divided into N virtual nodes to replace the random connection nodes of the classical RC. However, the time-delayed RC computing system still existed multi-parameter optimization problems such as mask matrix construction [3] (including mask length, mask signal type [18]) and delayed feedback control strength [9, 20, \(98.44\%\) on MNIST dataset, outperforming the parallel architecture RC system [41] based on the same dynamic memristor by \(0.45\%\) and demonstrating only a slight decrease of \(0.14\%\) compared to the classical RC system [27] with 1200 physical nodes. The result indicates the balance between recognition accuracy and resource consumption of the RC system. Meanwhile, the recognition accuracy for Fashion-MNIST dataset is \(90.19\%\), surpassing the laser-based RC system [14] by a significant \(5.53\%\). In addition, the impact of the map** voltage range, a crucial parameter of the proposed system, on the operational state of dynamic memristor and image recognition performance is extensively investigated. Besides, the feature extraction performance of the proposed RC system is validated by reducing the scale of training set, while the noise robustness of the proposed RC system is examined by recognizing noised images. Finally, in order to further verify the feasibility of the proposed RC system on complex image recognition tasks, two real-world color image datasets: CIFAR-10 and Cropped SVHN (the cropped version of Street View House Numbers) are used to evaluated the proposed RC system.

Table 1 lists all abbreviations used in this paper.

Table 1 Abbreviations used in this paper

The remaining of this paper is organized as follows: Section 2 summarizes the related work for time-delayed RC system. Section 3 introduces the theoretical model of dynamic memristor and proposes a nonmasking-based RC system with a single dynamic memristor, along with a demonstration of the specific image recognition process. In Sect. 4, the application of the proposed RC system in image recognition tasks is carried out, and the experimental results are also discussed. The final section concludes this study.

2 Related work

In view of the issues mentioned above, a brief overview of previous research on RC system hardware implementation is provided in this section, especially on data input, and structure design of time-delayed RC system. Table 2 summarizes the related work in these two major aspects.

Table 2 The research status of hardware-implemented RC

2.1 Data processing in input layer

At present, the input image processing methods of RC systems in image recognition tasks are mainly as follows: 1) Employing the pixel values of original image directly as input signals [30, 40]. 2) Utilizing geometric transformation method. Like the classical RC system, the boundary of a single original image is removed, and then rotated at different angles and spliced as the final input image to enhance the feature information [27]. The dynamic memristor-based parallel reservoir computing system rotates the original image by 0\(^{\circ }\), 30\(^{\circ }\), and 90\(^{\circ }\), respectively, combines three rotating images, and then cuts them vertically [41]. The pixel values of each cut image are used as the input signal. 3) Extracting HOG feature from the original image. For instance, the photonic RC system in [20] introduced the HOG technique at the input layer for extracting feature descriptors from the original image, serving as input. Huang et al. [14] similarly applied the HOG technique in processing input images. Yue et al. [39] compared four distinct methods for preprocessing input images within the optoelectronic RC system. Their findings indicated that employing the HOG technique for extracting image features as input leads to comparatively higher recognition accuracy.

2.2 Structural design of time-delayed RC systems

The structure design of delay RC system mainly involves three aspects: the construction of mask matrix, the design of delay feedback loop, and the architecture of nonlinear node.

The construction of the masking matrix encompasses matrix design algorithms, the selection of mask signal types, and the determination of the mask length. Derived from the concept of maximum length sequences, Appeltant et al. [3] outlined a procedure to construct an optimal mask pattern. This ensured the creation of the shortest possible mask that leads to maximum variability in reservoir states. Compared with random mask, optimal mask pattern makes the RC system achieve more stable performance. Kuriki et al. [18] utilized four distinct types of signals, namely, binary mask, six-level mask, random-level mask, and chaos mask, as mask signals in a photonic RC system, aiming to investigate how to improve the performance of time-delayed RC systems from the perspective of mask signal design. Zhong et al. [41] extensively examined the reservoir state and signal separation capability across varying mask lengths, contributing a theoretical foundation for determining the optimal mask length.

The design of the delay feedback loop mainly includes feedback intensity adjustment and loop structure design. Chen et al. [9] introduced a new hidden layer on the basis of time-delayed RC to construct an RC system with double feedback loops. This enhancement led to a significant improvement in performance on various tasks, including time series prediction, speech recognition, and nonlinear channel equalization. You et al. [38] proposed a RC system with multilayer time-delay structure and double feedback loops. The double feedback loop structure re-injected earlier-generated responses into the reservoir, enhancing the storage capacity of the RC system. The serial multilayer structure significantly improved the utilization of virtual nodes. Li et al. [20] proposed an optical reservoir calculation method based on a single physical node, which used an optical injection semiconductor laser with self-delay feedback as a reservoir. By setting multiple delay times, the number of virtual nodes was increased to enhance system performance.

The architecture of nonlinear nodes mainly includes single, parallel, and serial multilayer architectures. In Classical RC [27], the nonlinear nodes are connected randomly within the reservoir layer. Beyond that, the responses of a single nonlinear node to the input signal are served as the reservoir states within the reservoir layer of time-delayed RC systems. Time-delayed RC [3, 9, 18, 20] with a single nonlinear node architecture improved performance through strategies mentioned above, such as mask matrix construction or loop structure design. Most researches structured multiple reservoir layers in parallel or serial multilayer architecture to enhance the performance of the time-delayed RC systems [10, 14, 21, 23, 38, 39, 41]. For instance, Du et al. [10] utilized 88 dynamic memristors to implement a time-delayed RC system with parallel architecture. The system was deployed for the classification of handwritten digit images, resulting in an accuracy rate of 88.1\(\%\). Moon et al. [23] used a 32\(\times \)32 array chip of \(WO_x\) memristors to implement a time-delayed RC system with parallel architecture. In the task of spoken-digit recognition, the system achieved a remarkable classification accuracy up to 99.2\(\%\), even with partial input. Additionally, well long-term predictions for chaotic sequences can be achieved without retraining. Zhong et al. [41] proposed a time-delayed RC system with parallel architecture based on dynamic memristor, which outperforms most existing hardware-based RC systems in tasks (such as image recognition, waveform classification, spoken-digit recognition, and chaotic sequence prediction). Liu et al. [21] constructed a serial multilayer RC system, basing on ferroelectric \(\alpha \)-\(In_{2}Se_{3}\) devices with voltage input and output. The deep reservoir architecture was validated for its high memory capacity and powerful computational capabilities in tasks such as time series prediction and waveform classification. The time-delayed RC system, featuring a multilayer architecture in [38], enhanced the utilization of reservoir virtual nodes. This multilayer architecture also led to reduced prediction errors and improved resistance to interference in diverse time series prediction tasks.

3 Theoretical model and process of image recognition

This section first introduces theoretical model of dynamic memristor. Subsequently, a nonmasking-based RC system by using a dynamic memristor as the nonlinear node in reservoir layer is constructed. Finally, the specific process of image recognition is illustrated.

3.1 Theoretical model of dynamic memristor

The dynamic memristor introduced in [41] has a structure of \(Ti/TiO_x/TaO_y/Pt\), exhibiting I-V nonlinearity and short-term memory characteristics. On this basis, the constructed parallel RC system showed exceptional performance in tasks such as waveform classification, spoken-digit recognition, and Hénon map prediction. These results demonstrated the potential of the dynamic memristor to act as a nonlinear node in a time-delayed RC. According to [41], the theoretical model of the dynamic memristor is defined as

$$\begin{aligned} \left\{ \begin{array}{l} I = KG{V^3}\\ G = (1 - r){G_0} + \left( r - \frac{{\alpha |V|}}{{\alpha |V| + 1}}\right) G' + \frac{{\alpha |V|}}{{\alpha |V| + 1}}{G_{th}} \end{array}\right. , \end{aligned}$$
(1)

where

$$\begin{aligned} \left\{ \begin{array}{l} K = {K_p},{G_{th}} = 1,\quad \mathrm{{if }}\quad V \ge 0,\\ K = {K_n},{G_{th}} = 0,\quad \mathrm{{else}}, \end{array} \right. \end{aligned}$$
(2)

V and I represent the input voltage and output current of the dynamic memristor, respectively. G and \(G'\) denote the conductance values of the dynamic memristor at the current time step and the previous time step, respectively. The parameters K and \(G_{th}\) are determined by the sign of V as shown in Eq. (2). Other parameters are set as listed in Table 3. According to Eqs.(1) and (2), the simulated current–voltage (I-V) curve of the dynamic memristor can be obtained, as shown by the blue solid line in Fig. 3. Meanwhile, the black dashed line in Fig. 3 represents the actual I-V curve of the device, which is obtained from experimental data provided in [41]. It can be observed from Fig. 3 that the blue solid line approximately overlaps with the black dashed line, capturing the general trend of the black dashed line. Therefore, the theoretical model of dynamic memristor, described by Eqs.(1) and (2), matches well with the actual physical characteristics of the device and can be utilized for subsequent simulation experiments.

Fig. 3
figure 3

The I-V curve of dynamic memristor

Table 3 Parameters of the dynamic memristor model

3.2 Nonmasking-based RC system with a single dynamic memristor

In order to simplify the structure of the RC system and minimize the number of system parameters that require to be optimized, a nonmasking-based RC system with a single dynamic memristor is proposed in this study. As depicted in Fig. 4, the proposed RC system consists of three parts: input layer, reservoir layer, where a single dynamic memristor is utilized as the nonlinear node, and output layer. In the input layer, the feature information of the input signals is extracted and linearly mapped into an appropriate voltage sequence. In the reservoir layer, the voltage sequence is directly injected to dynamic memristor for iteration, allowing the feature information to be integrated into the iterative states of dynamic memristor. Each iterative state of dynamic memristor is regarded as a virtual node in the reservoir. Additionally, the feedback loop presented in the conventional time-delayed RC system [4] is omitted. Among them, the linear map** process enables the setting of input weights similar to the traditional masking procedure. The combination of feature extraction and dynamic memristor state iterative update is equivalent to the expansion of input signal in the time domain, enabling the removal of the traditional masking procedure present in the conventional time-delayed RC system. In the output layer, the corresponding current responses \(X_i= [x_\mathrm{{1}}, x_\mathrm{{2}}, \cdots , x_\mathrm{{N}}]^\mathrm{{T}}\) of the input voltage sequences \(V_i= [v_\mathrm{{1}}, v_\mathrm{{2}}, \cdots , v_\mathrm{{N}}]^\mathrm{{T}}\) will be directly collected as states of the reservoir, so that additional operations that require reading the conductance of memristor can be skipped. When all the images in the training set are input, the overall reservoir states collected during training phase are denoted as \(X=[ X_1, X_2, \cdots , X_n]\), where n is the size of training set. By using Tikhonov regularization, we minimize the mean square error between the system output and the desired output. Therefore, the output weight matrix \({W^{out}}\) can be obtained as

$$\begin{aligned} {W^{out}} = {Y_d}{X^T}{(X{X^T} + \beta I)^{ - 1}}, \end{aligned}$$
(3)

where \({Y_d} = [{y_1},{y_2}, \cdots , {y_k}, \cdots ,{y_n}]\) represents the desired output (\(y_k\) as shown in Fig. 5), I is an identity matrix, and \(\beta \) represents ridge parameter, which is set to avoid overfitting to the training data.

Fig. 4
figure 4

Illustration of nonmasking-based RC system architecture

In the test phase, all the reservoir states \(X_{test}\) of the signal to be identified are collected. Combined with the output weight matrix \(W^{out}\) calculated by Eq.(3), the predicted output \(Y_p\) can be obtained as:

$$\begin{aligned} {Y_p} = {W^{out}}{X_{test}}. \end{aligned}$$
(4)

Therefore, the proposed RC system avoids the parameters such as mask matrix, number of virtual nodes, and feedback strength that needs to be optimized in the conventional time-delayed RC system. Instead, the only parameter that needs to be optimized to improve recognition accuracy is to map the feature information into an appropriate voltage sequence.

3.3 The process of nonmasking-based RC system with a single dynamic memristor for image recognition

In this study, the image recognition performance of the proposed nonmasking-based RC system with a single dynamic memristor is first evaluated using the MNIST dataset and the Fashion-MNIST dataset. Furthermore, in order to verify the feasibility of the proposed RC system on more complex image recognition tasks, experiments are carried out on more challenging color image datasets: CIFAR-10 and Cropped SVHN. The MNIST dataset comprises 70, 000 images of handwritten digits from 0 to 9, authored by 250 individuals. The Fashion-MNIST dataset contains 70, 000 grayscale images of fashion product items, categorized into 10 classes. Both datasets are originally split into training and testing sets with 60, 000 and 10, 000 images, respectively. Each image in these two datasets is of \(28 \times 28\) grayscale pixels size. The CIFAR-10 dataset, which contains 10 categories of real-world object images, is divided into 50,000 training images and 10,000 testing images. There are 6000 color images of size 32x32 for each category. The Cropped SVHN dataset is obtained by crop** from Google Street View images. It contains 10 categories of 32x32 color images, of which 73,257 images are used for training and 26,032 images are used for testing. Examples of sample images and corresponding labels for each category are illustrated in Fig. 5, where \(y_k\) represents the label vector of the corresponding category, and the number of elements equal to the number of categories in the dataset (i.e., 10). Moreover, \(y_k\) is the column vector of the label matrix \(Y_d = [ y_1, y_2,..., y_k,..., y_n]\). Suppose that the label of a sample image is \(i \in \{ 0,1,...,9\} \), then all other elements in \(y_k\) are set to 0 except the \((i+1)-th\) element is set to 1. For example, the label of the digit “2” in MNIST dataset and “Pullover” product item in Fashion-MNIST dataset are “2”. Therefore, the corresponding label vector \(y_k\) is set to be \((0\ 0\ 1\ 0\ 0\ 0\ 0\ 0\ 0\ 0)^\mathrm{{T}}\). Figure 6 schematically illustrates the image recognition process of the proposed RC system. Firstly, the HOG features are extracted from each input image, resulting in a \(1 \times N\) feature descriptor. Subsequently, the feature descriptor is linearly mapped into the input voltage sequence for the dynamic memristor, and the corresponding output current is regarded as virtual nodes in reservoir layer. The virtual nodes in reservoir layer are coupled with each other under the excitation of input voltage sequence, achieving a nonlinear map** of image feature information (voltage) to reservoir states (current).

Fig. 5
figure 5

Schematic of sample dataset with corresponding labels and label vectors

Fig. 6
figure 6

Illustration of image recognition process

After that, it is divided into the training phase and the testing phase. In the training phase, each sample image from the training set is inputted in turn. The label vector \(y_k\) is served as a column in the label matrix \(Y_d\), and the corresponding reservoir state is collected as a column in the state matrix X. After all sample images from the training dataset are inputted, the corresponding label matrix \(Y_d\) and state matrix X can be obtained. According to Eq.(3), the output weight matrix \(W^{out}\) can be calculated. In the testing phase, all reservoir states \(X_{test}\) of the sample images in testing set are first collected. Then, the predicted output \(Y_p\) can be calculated according to Eq.(4). Among them, the row number of matrix \(Y_p=[y^1_p,y^2_p,...,y^k_p,...,y^m_p]\) is equal to the number of categories in datasets, where m is the size of testing set, \(y^k_p\) represents the output of a testing image. Finally, the winner-takes-all strategy is applied to \(y^k_p\) to recognize the test image category. As shown in Fig. 6, taking “2”, “Pullover”, “bird” or “2” as an example, their labels are both 2. If the output of the proposed RC system is that the third element \(p_2\) of the output column vector \(y^k_p =(p_0,p_1,p_2,...,p_9)^\mathrm{{T}}\) is the maximum value, the corresponding output recognized result will be “2”, “Pullover”, “bird” or “2”. In this case, it indicates that the recognition of the test image is correct; otherwise, it is incorrect.

In this study, the performance of the RC system for image recognition is evaluated using the recognition accuracy (ACC). Assuming that the total number of samples in the testing set is t and the number of correctly recognized image samples is c, one has:

$$\begin{aligned} ACC = \frac{c}{t} \times 100\%. \end{aligned}$$
(5)

4 Experimental results and analysis

4.1 Experiment results and comparative analysis

The proposed RC system is implemented based on the MATLAB software (version R2021a, 64-bit) and hardware device (Intel(R) Core (TM) i7-10700K CPU @ 3.80GHz, 64 G RAM). According to the original partition of the training/testing set in MNIST dataset and Fashion-MNIST dataset, the sample images in the training/testing set are input in random order. The optimized values for the map** voltage range \([V_{min},V_{max}]\) in the recognition tasks of MNIST dataset and Fashion-MNIST dataset are [\(-\)0.9, \(-\)0.1] and [\(-\)0.9,\(-\)0.4], respectively. And the corresponding recognition accuracy is \(98.44\%\) and \(90.19\%\), respectively. Specifically, the confusion matrix of the recognition results is depicted in Fig. 7. Figure 7a shows the experimental result of the proposed RC system for MINST dataset. Among them, recognition accuracies for digits “0” and “1” are both exceeding \(99\%\) and recognition accuracies for digits “2”, “3”, “5”, “6”, “7”, and “8” are over \(98\%\). Although recognition accuracy for the digit “9” is relatively low, it is still \(97\%\). In addition, Fig. 7b shows the experimental result of the proposed RC system for Fashion-MINST dataset. The recognition accuracy for “Bag” is the highest, reaching 97.6\(\%\), which is due to the significant differences in shape compared to other items. And recognition accuracies for “Trouser”, “Dress”, “Sandal”, “Sneaker”, and “Ankle boot” are over \(90\%\). However, the recognition accuracy for “Shirt” is only \(68.2\%\), with \(12.1\%\), \(8.4\%\), and \(5.9\%\) being wrongly recognized as “T-shirt/top”, “Coat”, and “Pullover”, respectively. The main reason for the misclassification of sample images is that these items are quite similar to each other, and the differences between the corresponding HOG feature descriptor are small, leading to confusion during the recognition process.

Fig. 7
figure 7

Experimental results for image recognition tasks

The performance comparison of several RC systems on recognition tasks is shown in Table 4. For the recognition task on MNIST dataset, the proposed RC system is compared with the Parallel RC system [41] using the same type of dynamic memristors and the Classical RC system [27]. In terms of the number of nonlinear nodes, the proposed RC system in this paper utilizes only one nonlinear physical node, while the Parallel RC system and the Classical RC system require 300 and 1200 nonlinear physical nodes, respectively. Regarding the preprocessing process, both the Parallel RC system and the Classical RC system mostly employ methods such as removing redundant boundary and concatenating images with different rotation angles to enhance feature information. The pixel values of the preprocessed images are finally used as input signals. In contrast, the proposed RC system references preprocessing procedures from machine learning, extracting HOG features from input sample images. The size of HOG feature descriptor for each sample image is \(1\times 1980\), which effectively reduces and filters out a large amount of redundant data before being injected into the reservoir layer. This approach avoids the extra computational overhead associated with the masking procedure. The performance comparison results in Table 4 demonstrate that the recognition accuracy of the proposed RC is higher than that of the Parallel RC system by \(0.44\%\) and experiences a slight decrease of \(0.14\%\) compared to the Classical RC system. Furthermore, for the recognition task on Fashion-MNIST dataset, the proposed RC system achieves a recognition accuracy that is \(5.53\%\) higher compared to the Laser-based RC system [14] with two parallel nonlinear physical nodes and masking procedure. The results confirm the effectiveness of the proposed RC system.

Table 4 Performance of several RC systems on recognition tasks

To further investigate the feasibility of the proposed nonmasking-based RC system, Table 5 presents a performance comparison between the proposed RC system with/without masking procedure. In Table 5, “With masking” indicates the proposed RC system with the traditional masking procedure, while “Nonmasking” means the proposed nonmasking-based RC system. Specifically, “With masking” will multiply the feature descriptor of sample images with a mask matrix following a Gaussian distribution (mean value is 0, standard deviation is 0.5) and then inject into the reservoir layer. The number of virtual nodes is set to 1980 (same with the size of HOG feature descriptor), and the voltage range is consistent with the values mentioned above. The test results shown in Table 5 indicate that compared to “With masking”, the proposed nonmasking-based RC system reduces the training time for recognition tasks on MNIST and Fashion-MNIST datasets by approximately \(46.1\%\) and \(45.33\%\), respectively. The corresponding recognition accuracies only slight decrease by \(0.5\%\) and \(0.35\%\), respectively. This provides a simple and efficient approach to balance image recognition accuracy and practical resource efficiency. Therefore, the proposed nonmasking-based RC system not only simplifies the nonlinear nodes of the reservoir layer to a single nonlinear physical node, but also further reduces the structural computational complexity and improves system efficiency.

Table 5 Performance comparison of the proposed RC system with/without masking procedure
Fig. 8
figure 8

Experimental results under different voltage ranges

4.2 Impact of map** voltage range on recognition performance of the proposed RC system

The map** voltage range is a critical factor affecting the operation state of the reservoir layer in the proposed nonmasking-based RC system. It is of significant importance to adjust the upper and lower limits of the map** voltage to make full use of the inherent dynamic characteristics of the dynamic memristor and to explore a favorable operating range. During the simulation process, the image recognition accuracy of the proposed RC system under different map** voltage ranges is calculated using the grid traversal method. Figure 8a and b shows the recognition accuracy distribution of the proposed RC system on MNIST and Fashion-MNIST datasets, respectively, under different map** voltage ranges. In Fig. 8a and b, the horizontal axis represents the lower limit of the map** voltage \(V_{min}\), while the vertical axis represents the upper limit of the map** voltage \(V_{max}\). Besides, the color bar indicates the magnitude of recognition accuracy, with darker colors indicating higher accuracy. As shown in Fig. 8a and b, the recognition accuracy of the proposed RC system changes with the map** voltage range, exhibiting similar trends across the two different datasets. The region with higher recognition accuracy covers a larger range, mainly distributed in the acute angle regions of the right triangle in Fig. 8a and b. Additionally, compared with the acute angle region labeled as “B”, the dark area within the acute angle region labeled as “A” is larger, indicating that the recognition accuracy is higher when both \(V_{min}\) and \(V_{max}\) are set to negative values. The simulation results indicate that the optimal map** voltage ranges for MNIST and Fashion-MNIST datasets are [\(-\)0.9, \(-\)0.1] and [\(-\)0.9, \(-\)0.4], respectively. And the corresponding overall recognition accuracies are \(98.44\%\) and \(90.19\%\), respectively. Furthermore, it can also be observed from Fig. 8a and b that the closer to the main diagonal region corresponds to higher recognition accuracy. That is, the smaller difference between the upper and lower map** voltage limits (\(V_{max}-V_{min}\)) results in higher recognition accuracy. In order to further analyze and validate the impact of the difference in map** voltage limits on the recognition accuracy, \(V_{max}\) is increased from -2.9V to 3V in increments of 0.1V, while \(V_{min}\) is fixed as -3V. The corresponding recognition accuracy curves of the proposed RC system for MNIST and Fashion-MNIST datasets are shown in Fig. 8c, demonstrating a similar trend. When \(V_{max}\) is negative in the initial stage, the recognition accuracy remains at a relatively high stationary value. Subsequently, when \(V_{max}\) is greater than 0V, it exhibits a rapid monotonically decreasing trend with the increase of \(V_{max}\). This indicates that a larger difference between the upper and lower map** voltage limits (\(V_{max}-V_{min}\)) leads to lower recognition accuracy.

Fig. 9
figure 9

Conductance fluctuations of dynamic memristor versus recognition accuracy

Furthermore, the impact of internal conductance fluctuation of the dynamic memristor during the recognition process is investigated. Specifically, an image from each category of sample images is randomly selected, and the feature descriptor of the selected image is mapped into a voltage sequence according to different map** voltage ranges. Then, the voltage sequence is injected into the dynamic memristor, and the corresponding conductance response curves for MNIST and Fashion-MNIST datasets are depicted in Fig. 9a-c, d-f, respectively, demonstrating a similar trend. When the conductance fluctuation is highly intense as shown in Fig. 9a and d, the recognition accuracy is as low as \(87.03\%\) and \(77.55\%\), respectively. As the conductance fluctuation gradually becomes smoother, as depicted in Fig. 9b and e, the recognition accuracy increases to \(96.7\%\) and \(85.29\%\), respectively. When the conductance fluctuation is very gentle, as shown in Fig. 9c and f, the recognition accuracy reaches its maximum values of \(98.44\%\) and \(90.19\%\), respectively. This is because when the conductance of the dynamic memristor changes drastically, it is easy to drive the reservoir states to reach the upper or lower limit [41], decreasing the richness of the reservoir state. In other words, the signal separation ability of the proposed RC system is reduced, leading to a larger recognition error. Therefore, in order to achieve high recognition accuracy of the proposed RC system, it is necessary for the dynamic memristor to exhibit a gentle fluctuation of conductance change under the excitation of the voltage sequence. That is, the map** voltage limits \(V_{min}\) and \(V_{max}\) are required to be negative, and the difference between \(V_{max}\) and \(V_{min}\) is required to be relatively small.

4.3 Impact of the training set scale on recognition performance of the proposed RC system

In principle, a larger scale of training set leads to more comprehensive coverage of features from samples, resulting in higher accuracy. However, this also leads to a significant increase in training time and computational complexity. The impact of the training set scale on the recognition performance of the proposed nonmasking-based RC system for Fashion-MNIST dataset is further discussed. In the simulation experiment, the scale of the testing set remains unchanged as 10,000 and the scale of the training set is gradually reduced. The corresponding recognition accuracy of the proposed RC system is shown in Table 6, exhibiting a monotonous downward trend. When the scale of the training set is reduced from 60,000 to 20,000, the training time rapidly decreases by \(68.67\%\), while the recognition accuracy only decreases by \(0.7\%\). Even in the extreme case that the each category of sample images in the testing set is only 1, the recognition accuracy still exceeds \(50\%\). It can be seen that the RC system proposed in this paper can achieve good recognition accuracy from a small scale of training set, providing an effective approach for small-sample image recognition.

Table 6 Experimental results of dataset scale on recognition accuracy and training time
Table 7 Recognition accuracy under different noise parameters

4.4 Noise robustness verification

In order to verify the robustness of the proposed nonmasking-based RC system, noise interference experiments are performed on the image recognition task using the Fashion-MNIST dataset. The noise parameter of Gaussian noise, Salt & Pepper noise, and Speckle noise is denoted as \(\sigma \). Specifically, Gaussian noise following a distribution \((0,{\sigma ^2})\), Salt & Pepper noise with noise density \(\sigma \) and Speckle noise with standard deviation \(\sigma \) are added to 10,000 testing images, respectively. The experimental results are shown in Table  7. When the noise parameter \(\sigma \) is 0.1, the proposed RC system exhibits the strongest resistance to Speckle noise and the worst resistance to Salt & Pepper noise. When the noise parameter \(\sigma \) is 0.01, the recognition accuracy corresponding to Gaussian noise, Salt & Pepper noise, and Speckle noise only decreases by 0.9\(\%\), 1.9\(\%\), and 0.6\(\%\), respectively. Moreover, with the decrease of noise parameter \(\sigma \), the recognition accuracy is almost free from the noise. The experiment results demonstrate that the proposed RC system retains the ability to achieve a high recognition accuracy even when the test image is disturbed by noise, verifying its robustness.

Table 8 Comparison results with Fit-DNN

4.5 Feasibility verification and comparative analysis for more complex image recognition tasks

Real-world images have the characteristics of variable perspectives, different scales, background interference, intra-class differences, etc., which is a challenging problem in the field of image recognition. In this section, the proposed RC system is applied to the real-world color image recognition task, and its recognition feasibility is verified on the two datasets of CIFAR-10 and Cropped SVHN.

According to the original division of the training/testing set of CIFAR-10 and Cropped SVHN datasets, when the map** voltage range \([{V_{\min }},{V_{\max }}]\) is preferably selected as \([ - 1.6, - 0.8]\) and \([ - 1.3, - 1.2]\), the recognition accuracy of the proposed RC system is 57.81\(\%\) and 80.84\(\%\), respectively. In contrast, the cross-point arrays-based quasi-static memdiode model achieved recognition accuracies lower than 30\(\%\) on these two datasets [1]. The recognition accuracy of hierarchical memcapacitive RC is below 17.5\(\%\) on CIFAR-10 dataset [33], while the recognition accuracy of memristive deep delayed feedback based RC is 78.5\(\%\) on Cropped SVHN dataset [6]. Table 8 compares the proposed RC with Fit-DNN [30] which is also based on single neuron node. The recognition accuracy of the proposed RC on CIFAR-10 dataset is 5.13\(\%\) higher than that of Fit-DNN, while it is only 0.66\(\%\) lower on Cropped SVHN. In terms of input data, the proposed method first extracts 1\(\times \)936 HOG feature descriptors of each input image, so that the number of nodes in the input layer is only 936, while Fit-DNN directly inputs 3072 original image pixel values. From the perspective of hidden layer structure, the proposed method is a single-layer structure with 936 nodes in hidden layer, while Fit-DNN uses three hidden layers with 400 nodes in each hidden layer. Meanwhile, in terms of feedback loop, the proposed RC removes the feedback loop, while the Fit-DNN is set with 100 feedback loops. Therefore, the experimental results of the proposed RC are competitive on CIFAR-10 and Cropped SVHN datasets.

In addition, the stability of the proposed RC system for real-word image recognition is further verified by changing the number of HOG feature descriptors. The experimental results in Table 9 show that when the number of HOG feature descriptors is 3276,1560 and 936, the recognition accuracies of the propose RC system on CIFAR-10 and Cropped SVHN datasets are over 57\(\%\) and 80\(\%\), respectively. It can be seen that the proposed RC system has the characteristics of large available interval of parameters, showing stable recognition performance.

Table 9 Recognition accuracy corresponding to different numbers of HOG feature descriptors

The experimental results in this subsection show that the proposed RC system can still achieve good recognition accuracies and stable recognition performance on more complex real-world image recognition tasks, which further verifies its application value in real image recognition.

5 Conclusions

On the basis of the time-delayed RC system, this work proposes a nonmasking-based RC system with a single dynamic memristor by fully utilizing the intrinsic nonlinear characteristics of the memristor. The proposed RC system involves the utilization of only one dynamic memristor while removing the masking procedure and feedback loop from the conventional time-delayed RC system. Thus, the structure of the conventional time-delayed RC system is simplified. Specifically, the extracted HOG features from input images are linearly mapped into voltage sequences and directly injected into the dynamic memristor. By appropriately adjusting the map** voltage range, the proposed RC system demonstrates excellent performance in image recognition tasks for both MNIST and Fashion-MNIST datasets. At the same time, high recognition accuracies are also achieved on more complex image datasets: CIFAR-10 and Cropped SVHN. This provides a simple and efficient approach to balance the image recognition accuracy and hardware implementation complexity of the system. Furthermore, the proposed nonmasking-based RC system exhibits shorter system training time and stronger noise robustness in computationally intensive image recognition tasks, which can be served as an effective solution to implement on resource-limited computing platforms such as IoT devices.

Image recognition based on deep neural networks has far surpassed traditional machine learning algorithms in terms of accuracy and efficiency. In order to facilitate the hardware implementation of the deep neural networks, in the future research, we will further combine the nonlinear characteristics of novel materials or devices to simplify the structure of the representative deep neural networks. By doing so, this may pave the way for in-memory computing.