Abstract
Today performance and operational efficiency of computer systems on digital image processing are exacerbated owing to the increased complexity of image processing. It is also difficult for image processors based on complementary metal–oxide–semiconductor (CMOS) transistors to continuously increase the integration density, causing by their underlying physical restriction and economic costs. However, such obstacles can be eliminated by non-volatile resistive memory technologies (known as memristors), arising from their compacted area, speed, power consumption high efficiency, and in-memory computing capability. This review begins with presenting the image processing methods based on pure algorithm and conventional CMOS-based digital image processing strategies. Subsequently, current issues faced by digital image processing and the strategies adopted for overcoming these issues, are discussed. The state-of-the-art memristor technologies and their challenges in digital image processing applications are also introduced, such as memristor-based image compression, memristor-based edge and line detections, and voice and image recognition using memristors. This review finally envisages the prospects for successful implementation of memristor devices in digital image processing.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Digital image processing technology, a technique for processing image information with computers or real-time hardware, mainly involves image coding and compression, image enhancement and restoration, image segmentation, image recognition, and so on. Common algorithms include the familiar single/multi-scale retinex algorithm for image enhancement [30] without requiring training process.
However, the architectures mentioned above involve complex software algorithms, which intangibly add processing time and reduce the system efficiency. Therefore, researchers are inspired by the human visual system and try to establish a novel digital image processing architecture. The human visual system (HVS) integrates perceptual and processing, which involves filtering or suppressing noise and enhancing target features with the retina, followed by parallel high-level image processing in the visual cortex. [11, 12, 41,42,43]. In digital general-purpose processors, many image processing applications require multiple operations per second, even though these applications do not require floating-point precision [44]. In a memristor-based image processing network, the image processing time and iterations required for the program are directly reduced on account of the fast-switching speed and low power consumption of the memristor, which can not only store information but also compute and process it.
Algorithms with memristor behavior have an impact on digital image processing as they introduce nonlinear effects in digital image processing algorithms that may lead to more complex and diverse image processing results. In addition, some complicated algorithms require a considerable computational resource, resulting in slower operation and the need to optimize the algorithm or use better computer hardware. Therefore, it is necessary to carefully consider the effects of the memristor behaviors on the processing results when applying them to digital image processing algorithms and to select appropriate processing methods to control and adjust their effects to meet specific image processing demands. In terms of digital image processing results, the quality of images processed in this way is not optimal when dealing with images generated under special conditions (e.g., poor lighting conditions, excessive noise, etc.) or large-scale image data. Therefore, to reduce power consumption and training costs, hardware digital image processing architectures based on memristor networks that enable massively parallelism and minimize data transfers have emerged.
Memristor is the crucial component for the analog visual system's enhancement and inhibition effects. The principle of lateral inhibition of biological neurons is shown in Fig. 1a. When a neuron is excited through stimulation, and a neighboring neuron is stimulated, the excitation occurring in the neighboring neuron has an inhibitory effect on the former, and this feature coincides with the properties of the memristor. The memristor, introduced by L. Chua in 1971 [45] from the completeness of the circuit, is the fourth elementary two-terminal circuit element characterized by a nonlinear constitutive relationship between flux and charge and was consciously discovered by Strukov D B and his team in 2008 on the nanoscale metal oxides [46]. Moreover, memristors have optimal write energy and standby power, where the majority of pulse-code modulated (PCM) devices and resistive random access memories (RRAM) have write energies of about 10–100 pJ and 100 fJ–10 pJ [47], respectively. Several studies have proved that the computational energy efficiency of memristors exceeds that of today's graphics processing units by two orders of magnitude [48]. Here, the enhancement process of a 3 × 3 grayscale image is used to explain the process of employing a memristor array structure to a hardware digital image, as shown in Fig. 1b. A two-dimensional image consists of many pixel points, and in view of a grayscale image, the grayscale value of the image is mapped to the input voltage (current) of the array, and the output voltage (current) is obtained through vector operation or interaction (enhancement or suppression) between the memristors by the array of equal size, and then the opposite step is performed as the previous one, i.e., the voltage or current is mapped to the grayscale values of 0 to 255. Finally, the processed results can be obtained. In memristor-based image processing networks, the fast-switching speed, and low power consumption of memristors directly reduce the image processing time and the iterations required for the program because of their capabilities not only for storing information but also for calculating and manipulating it.
In this article, we focus on hel** the reader understand the current status of memristor devices and image processing based on memristor circuits. We present recent research on the application of memristors in hardware image processing and compare the implementation of pure software image processing and memristor-based image processing. Their advantages, disadvantages, and existing problems are subsequently analyzed. This paper is divided into four parts. “Memristor” section escribes the theory of memristors and presents the reasons for their application in the field of neuromorphology. “Image quality assessment metrics” section introduces several commonly used image evaluation metrics to facilitate later comparisons of the effects of memristor-based hardware digital image processing. “Discussion” section lists the current research on the application of memristor-based circuits in various aspects of image processing. Finally, we conclude with a discussion of the prospects for the development and openness of hardware digital image processing and summarize the work of this paper.
Memristor
With the continuous development of big data, the Internet of Things, artificial intelligence and other technologies, it is urgent to put forward a new computing system to deal with dense data. The human brain can process and store data simultaneously, thus reducing energy consumption and greatly improving the efficiency of computing. Therefore, building brain-like operations and develo** intelligent brain-like devices is an essential breakthrough in AI research [49]. Researchers at HP Labs have experimentally confirmed that memristors are a new type of nonlinear two-terminal nanoscale component with switching characteristics, memory capability, and continuous input and output properties [46]. Due to its inherent property of analog inputs and outputs, memristor-based memories can allow for higher accuracy than conventional binary memories. Compared to dynamic random access memory, memristors maintain their state after power loss, making memristor-based memories non-volatile [50, 51]. Notably, the combination of memristors and nanowire crossbar interconnection has become a topic of great interest to researchers [52, 53]. The memristor crossbar array structure combines the features of high storage density, high precision and fast access speed of memristors with the massively parallel processing of crossbar arrays, enabling the structure to possess strong information processing capabilities and easy compatibility with large-scale integrated circuits (VLSI). Considering the advantages above, it has broad application prospects in arithmetic operation, mode comparison, information processing, and virtual reality. This section introduces the memristors commonly applied in the hardware architecture of digital image processing and their working mechanisms. At the end of this section, we also summarize the electrical performance of memristors with different structures at the current stage (Table 2).
Memristors for image processing
Memristor is a nonlinear resistor with memory capability whose resistance is affected by the amount of charge or magnetic flux passing through it. In 1971, Chua [45, 54] theoretically proposed the memristor (short for memory resistor) based on the symmetry argument of circuit theory. Memristance (resistance of a memristor) was defined as the ratio between the magnetic flux φ and charge q passing through the memristor (i.e., \(M = {\text{d}}\varphi /{\text{d}}q\)) by Chua (Fig. 2a). As φ and q are time integrals of voltage and current, respectively. Then,
a Four basic circuit elements and their respective relationships. b A typical hysteresis loop of the memristor. c Diagram illustrating the structure of a neuromorphic crossbar comprised of memristor synapses and CMOS neurons [58]. d TEM cross-section of the Ta/HfO2/Pt device. Measurements run with the top Ta electrode biased and the bottom Pt electrode grounded [59]. e Typical I–V curve showing resistor switching behavior, with black arrows indicating device switching direction [59]. f High- and low-resistance states have been demonstrated for devices with 120 billion switching cycles at -3.05 V/100 ns RESET and 1.3 V/100 ns SET pulses [59]. g Retention testing of eight different levels at 150 °C (> 104 s) confirmed the non-volatile characteristics and demonstrated the device's suitability for multi-level memory [59]. h 2.20 enhancement/inhibition epochs were realized, each of these pulses comprising 39 pulses [59]. i Device structure and cross sectional TEM image of the Ag–TiO2 nanocomposite-based memristor [35]. j Schematic of optically gated electrically driven synaptic modulation operation [35]. k I–V curve of a memristor device after 15 min of exposure to visible light [35]. l Long-term conductance augmentation and inhibition stimulated by 50 positive/negative pulses (± 2 V, 50 ms) [35]
This equation shows that the unit of M is the same as the resistance, i.e., ohm (Ω). In 1976, Chua and Kang elucidated the strong dependence of memristive systems on the implementation of state variables and provided a generalized definition of memristive systems derived from memristors [54], which can be mathematically defined as:
where w is an internal state variable, and in general R and f are explicit functions of time. If an arbitrary periodic voltage (current) signal is applied to an ideal memristor and the excitation voltage (current) and response voltage (current) are then plotted, a diagonal "8"-shaped tight pinch hysteresis return is obtained, as shown in Fig. 2b, which was used by Chua as a landmark criterion for memristor phenomena [55]. This definition was eventually refined in Chua's latest publication [56, 57]. This pinched hysteresis loop of current voltage (i − v) has also become the most representative feature of the memristor. The shape of this loop varies with the amplitude and frequency of the input waveform, but the common feature is the absence of positive and negative values in each cycle and the passage through the origin of the coordinates. Meanwhile, direct experimental support for memristor neuromorphic systems such as spike-timing-dependent plasticity originated from a hybrid system of memristor synapses and CMOS neurons (Fig. 2c).
Here we focus on two typical types of memristor structures and device performance applied in digital image processing. Jiang et al. reported a Ta/HfO2/Pt memristor (Fig. 2d) [59] with low programming voltages (Fig. 2e), fast switching speeds (≤ 5 ns), high endurance (120 billion cycles) (Fig. 2f) and reliable retention (> > 10 years extrapolated at 85 °C). In addition, potentiation and depression were demonstrated over 220 epochs (Fig. 2h), indicating that the device can be used for multi-level non-volatile memories (Fig. 2g) and neuromorphic computing applications. Shan et al. developed a plasmonic optoelectronic memristor [35] (Fig. 2i) that relies on optical excitation in an Ag-TiO2 nanocomposite film and the effects of localized surface plasmon resonance (LSPR). Fully light-induced and light-gated synaptic plasticity functions were achieved in the single device (Fig. 2j), including reversible synaptic potentiation/suppression under visible and ultraviolet illumination and modulation of the STDP learning rule (Fig. 2k, l), which can be utilized for visual sensing and low-level image pre-processing (including contrast enhancement and noise reduction).
Working mechanism
The mechanism lies in the fact that synapses are intrinsically two-terminal devices, which share a striking similarity with memristive devices [45, 46]. The advantage of this structure is that it can potentially provide connectivity and functional density comparable to biological systems, rather than operating in a digital computer manner [60]. These devices consist of a simple metal − insulator − metal (MIM) layer structure. The forming process creates localized conducting filaments, and the movement of these filaments leads to discrete and abrupt resistive switching characteristics [51, 61,62,63,64]. Specifically, the switching kinetics dominated by anion migration in semiconductors can be understood as follows. There are some mobile oxygen ions in the p-type storage medium, as schematically illustrated in Fig. 3a-i. These moving oxygen ions migrate toward the TE when the top electrode (TE) is positively biased and then accumulate near the TE, thus creating a large number of cationic vacancies in the TE (Fig. 3a-ii). Once the fully p-type semiconductor conducting filament is formed, the device will switch to the low-resistance state (LRS) (Fig. 3a-iii). Most of the Joule heat will be generated at the thinnest part of the conducting filament when TE is negatively biased, greatly accelerating the movement of oxygen ions in that region. The oxygen ions flowing in this region will rapidly migrate toward the BE driven by the electric field, and as a result, the concentration of cationic vacancies at the thinnest part of the CF of the p-type semiconductor will be significantly reduced, resulting in the CF breaking off there, at which point the device is in the high-resistance state (HRS) (Fig. 3a-iv). When semiconductor (TiOx) junctions/two dynamic metal (Pt) are operated in series, a range of device states occur.
a Schematic of anion migration dominated switching kinetics in p-type semiconductors. (i) The initial state with random distribution of mobile oxygen ions. (ii) The nucleation and subsequent growth of p-type CFs composed of cation vacancies from anode to cathode during the forming process. (iii) Full CF LRS in the thinnest region near the cathode. (iv) The thinnest region of the CF portion ruptured by the HRS [65]. b Schematic of BMThCE-based device, and the chemical structures of the photochromic diarylethene (UV: ultraviolet light; VIS: visible light) [35]. c I–V characteristics of the BMThCE-based memories ITO/o-BMThCE/Al and d ITO/c-BMThCE/Al [35]
Slightly different from electrically induced RS memories, the physical mechanisms of optical effects in optical memristors include photovoltaic effects and light-induced chemical reactions/configuration changes, etc. Photovoltaic effects typically involve the creation of free carriers, the separation of photogenerated electron–hole pairs, and the generation of voltages or currents from incident photons [66]. The separation of electron–hole is highly correlated with the Schottky barrier between metal and semiconductor or the internal electric field induced by the heterojunction interface (heterojunction system) [67]. This causes the holes to move toward positive electrode and electrons to the negative electrode, which subsequently extracts charge to the external circuit and generates an open circuit voltage. Photochemical reactions entail photons absorption, which excite molecules and cause chemical changes such as ionization and isomerization [68] (Fig. 3b). The photo-induced switching behavior is tightly linked to conformational changes within the photoactive material, which may lead to changes in chemical bonds and energy bands. The photo-induced transition between conformational structures does have a remarkable impact on the RS type as the energy level changes, which can greatly modulate the device performance in a precise and energy-efficient manner (Fig. 3c,d).
Reportedly, memristors respond to light and electrical stimuli [69,70,71,72]. Neuromorphic computing implementations in the electrical and optical domains requires a full combination of the integrated processing power of the electrical domain and low energy consumption and high bandwidth of the optical fields. Memristors have become both state modulators and photodetectors for their particular characteristics, capable of processing both electrical and optical signals. The common methods to realize synaptic or neuronal behavior include modulating the memristor state with electrical and optical programming signals, i.e., resistance or optical transmittance. In addition, the programmed input and readout signals are located in different domains, thus enabling direct conversion of optoelectronic signals, which is extremely attractive. For example, an electrical (optical) signal can change the optical (electrical) signal in a state modulator (photodetector).
Image quality assessment metrics
Image quality assessment metrics play an important role in various image processing applications. Digital images suffer from various distortions during the process of acquisition, processing, compression, storage, transmission and reproduction, any of which may leads to a degradation of visual quality. Image quality assessment metrics are available for optimizing algorithms and parameter settings of image processing systems and benchmarking them, and dynamically monitoring and adjusting image quality. Two types of metrics exist for assessing image quality, subjective and objective image quality assessment metrics. They are briefly described below.
Subjective image quality assessment metrics
Subjective assessment, also called subjective evaluation, is to evaluate the quality of an image through the subjective perception of a person as an observer and can most truly reflect the human visual perception. Common subjective evaluations are absolute and relative evaluations. The former involves the observers rating the original image and the image to be evaluated, and the latter involves the observers comparing the given image based on their own subjective feeling without any reference. The final evaluation score for both methods is the average of each evaluation score.
The subjective evaluation criterion uses the Mean Opinion Score (MOS):
where \(k \in \left\{ {1,2, \ldots K} \right\}\) is the evaluation level of the observer, Si is the evaluation score corresponding to the level, and Ni is the number of evaluators for each type of score.
Objective image quality assessment metrics
Unlike the subjective assessment of images, objective evaluation assesses the quality of the image by establishing a mathematical model, scoring the image texture, sharpness, focus and other aspects and calculating the results, which can scientifically reflect the human eye's subjective perception of the image. It can be divided into full-reference, half-reference and no-reference image quality assessment methods according to whether the corresponding reference image can be found [87]. This section presents several common objective image quality assessment metrics, which are as follows.
Mean Square Error (MSE): an expected value of the squared difference between the true and estimated values of a parameter. Assuming that the reference image is f, image to be measured is g, and size of two images is M × N. The grayscale values of the pixels are noted as f(i, j), g(i, j), and the mean squared error can be expressed as:
Peak Signal to Noise Ratio (PSNR): a calculation of the ratio of the maximum power of a signal to the power value of the noise. The larger the value, the smaller the distortion. The formula for calculating the PSNR is shown in Eq. (6).
Structural Similarity (SSIM): A well-known qualify metric developed by Wang et al. [87] for measuring the similarity between two images. It is thought to be associated with the perception quality of the HVS. SSIM is designed to model any image distortion as a mixture of three factors, loss of correlation, contrast distortion, luminance distortion. The SSIM is defined as:
where
Note that C1, C2, C3 are positive constants, aiming at avoiding the denominator to be 0 and σfg is the covariance between f and g. The first item in (8) is the luminance comparison function which indicates the proximity of the average brightness of two images (μf and μg). This factor acquires the maximum 1 only if μf = μg. The second one is the contrast function, which measures how closely two images compare, where contrast is measured in terms of standard deviation σf and σg. The maximum value of this term is 1 only when σf = σg. The last one is the structural contrast function representing the relevant coefficient between the two images f and g. Hence, the positive value range of SSIM is [0, 1], where the value of 1 means that f = g and 0 means no correlation between images.
Applications of memristor in digital image processing
Memristors have been widely employed in simulating artificial synapses because of their complex analog behavior since the rediscovery of the reversible resistive switching effect. Meanwhile, memristors can also be integrated with CMOS logic devices to serve as programmable switches [88], logic units [108,109,110]. (iii) Images obtained using the proposed 2D DCT reconstruction [111]. h Schematic illustration of a proposed physical crossbar array implementation and read circuitry [112]. i From top to bottom are the initial 50 k-byte image as the simulated input, the intermediate image representing the fuzzy logic level processing and the final 25 k-byte image after map** back to the binary bitmap [112]
DCT has a superior performance in terms of energy compression; but the entire calculation process is more complicated, which increases the burden on the calculation process. Compared with DCT, DWT exhibits a higher peak signal-to-noise ratio (PSNR) and faster image compression speed. However, traditional image compression methods, such as JPEG2000, require complex hardware to implement the calculation process. Therefore, directions such as reducing computing energy, required area, and image quality have become research hotspots.
Li et al. proposed a large-scale memristor crossbar switch for analog computing [108, 110] to achieve image compression by structuring an array of memristors up to 128 × 64 crossed hafnium oxide (HFO2) memristors [59] with sufficient accuracy and high-speed energy efficiency to realize analog vector multiplication. The proposed memristor array structure is presented in Fig. 5b, where the researchers construct a "1T1R" model, i.e., a memristor is integrated into a single piece on top of a metal oxide semiconductor transistor as an access device in each cell, for precisely adjusting the conductance of each memristor in the crossbar. The original compressed image was input into the array for pre-processing (Fig. 5c), the voltage was corresponding to the conductance value of the memristor, and vector matrix multiplication was performed (Fig. 5d), comparing the compression effect of software and hardware, as shown in Fig. 5e, f. The advantage of this framework is that the memristor longitudinal hardware VMM can directly process the analog signal acquired from the sensor, without the additional peripherals such as analog-to-digital converters (ADCs) and consuming additional time and energy. In addition, it can provide threshold gating circuits at considerably lower latency and energy cost, if only specific features need to be detected in the signal. This flexibility, along with low latency and high energy efficiency, makes analog longitudinal computing ideal for diverse edge and IoT computing.
To overcome the drawback that series computations are vulnerable to errors, Zhang et al. [111] fundamentally rethought how to implement image compression using resistive cross arrays (RCAs). The key idea is to reorganize the computation so that it natively matches the characteristics of the underlying resistive hardware, while the employed spectral optimization technique, quantization optimization technique, and 2D DCT reconstruction technique improve the robustness to errors for high-speed and efficient small-module processing. Meanwhile, simulation results showed that the quality of image processing was significantly improved (Fig. 5g), while the latency and power were reduced by 21% and 62%, respectively, facilitating the large-scale utilization of RCA with cost reduction requirements.
We compared multiple image compression techniques by two different datasets (Berkeley segmentation dataset and standard dataset) in terms of compressed quality parameters (MSE, PSNR, and SSIM), compression ratio, latency, power, and area as shown in Table 3. Here, D stands for direct map** [108, 110]. D-P stands for the D method but its implementation is pipelined to maximize the throughput [109]. R stands for the proposed framework which is applied only with 2D DCT reconstruction. RF stands for the R method extended by spectral optimization. RFQ method is the RF method extended by hardware-friendly quantization. The normalized performance is shown in bold in the table, and according to Eqs. (5), (6), (7), the high image quality is featured by lower MSE, higher PSNR and SSIM. Tabulated results show that the proposed model has an image quality very similar to that of the human eye despite a slightly higher MSE and slightly lower PSNR and SSIM. Compared to previous work in [108, 110], the image quality is improved while the latency and power consumption are reduced by 51% and 24% or 3% and 61%, respectively.
Currently, integrated circuits that perform mathematical operations in artificial visual perception and image processing are mainly constructed of traditional digital logic gates. However, Boolean logic operations are not the most optimal alternative for brain computing, given the ambiguity possessed by biological neural networks. Previous studies have synthesized the optoelectronic properties of memristors and used a single optical gated memristor to build logic gates to realize logical OR and logical AND operations, while the important part of the logical NOT operation is missing in these gates, which requires a complicated operation to perform. Dan Berco et al. proposed a programmable photo-memristor gate [112], and this device can be used for image compression immediately during image acquisition, no additional memory modules are required (Fig. 5h, i). This design significantly reduces the number of processors and memories and time interactions. The smallest module of the designed structure consisted of two memristors and a resistor, which were used as building blocks in the design and simulation of the matrix multiplication unit by using logical operations (NIMPLY-AND) to form an effective in-situ compression of the image. However, the framework only performed single-channel image processing, and its effectiveness for more complex image processing was not explicit. The photoelectric properties exhibited by memristors in this case provide a new way of thinking for the development of intelligent vision.
Image segmentation
Image segmentation, the division of target regions in an image from other regions as the name implies, is a crucial pre-processing for image recognition and computer vision. Among them, edge detection has become an overwhelming approach to image segmentation due to its distinguishing feature of different gray levels at the boundaries. Edge information is frequently utilized in image analysis, recognition and understanding. Therefore, edge detection and extraction are particularly important in image processing, and this technique is common in medical imaging, face and fingerprint recognition, traffic control systems and so on.
Image edge detection extraction contributes to clinical diagnosis, and to address the shortcomings of traditional medical image fusion algorithms, Zhu et al. [34] constructed a memristive pulse-coupled neural network (M-PCNN) for medical image processing, and memorized threshold generator circuit is shown in Fig. 6a. The principle is that when an image is input to the M-PCNN, spiking neurons transmit stimuli to neighboring neurons and impel them to release pulses [114] to detect grayscale mutations of edges, which in turn enables edge detection. The edges were found clearer and richer obtained by using M-PCNN in medical image edge extraction (Fig. 6b). In addition, the integration of memristors into PCNNs significantly reduces the size of PCNNs while making the network biologically functional, which may facilitate the development of hardware implementations of neural networks. Although the core of the architecture was M-PCNN, which could simultaneously exploit the properties of specific linear additivity and nonlinear multiplicative coupling, allowing the introduction of a memristor to bring the network closer to a biological neural network, the peripheral circuits are needed to be redesigned for achieving the results of the different image processing, and the impact of the peripheral circuits on each processing method could not be ignored.
a Circuit diagram of the proposed M-PCNN structure and the memorized threshold generator circuit [34]. b Extraction of edges of CT images using different methods, in order from top to bottom from left to right: source image, using LOG operator, using canny operator and using the proposed M-PCNN with different memorizer parameters [34]. c (i) Flow-based computing with crossbar circuits. (ii) Crossbar design. (iii) Crossbars for edge detection on input-aware pixel pairs (median PSNR = 6 dB) [113]. d The input grayscale image, the computed edge image and the output edge image obtained via majority-based combination of approximately correct input-aware crossbar outputs, respectively [113]. e Schematic of the 3D circuits composed of high-density staircase output electrodes (blue) and pillar input electrodes (red). Sideview of 3D row banks and column side showing unique staircase electrodes. Each row bank in the 3D array operates independently [36]. f Comparison between the hardware and software edge detection of video frames [36]
Chakraborty, D.'s team [113] explored the design of a stream-based cross-circuit with approximately correct input perception, producing multiple 8 × 8 cross-switching circuits (Fig. 6c-iii) belonging to two groups, one of which performs approximate edge detection for a specific application subset of the input values, and the other executes threshold-based edge detection for all possible pixel pairs with a high degree of accuracy (~ 85% accuracy). Outputs from the individual crossbars are combined using the majority function to yield the final output image. Figure 6c-i depicts a flow-based computation using the simple Boolean formula "a AND b" [115], where the data are added to the two-dimensional array of nanoscale memristors (Boolean operation is performed by adding the data to Fig. 6c-i a and b), and the current passing through the crossbar performs the desired computation. The current goes from the rightmost nanowire to the leftmost nanowire if and only if the formula "a AND b" is true. An example of a cross-switch that realizes the Boolean formula ¬A∧¬B∧¬C is shown in Fig. 6c-ii. Where the green circle indicates a memristor in the ON state, the gray circle indicates a memristor in the OFF state, and the blue circle assumes the value of literals. The team tested the edge extraction performance of the architecture on the BSD500 database [116] and utilized the PSNR metric to assess the quality of the output image, showing that the results (Fig. 6d) obtained from the input-aware approximation computation were significantly better than those produced by the more accurate general-purpose crossover. The cross array used for approximate computation, although effective in terms of accuracy and overall quality of edge extraction, adopts standard peripheral circuits and lacks exploration of the effect of peripheral circuits, which have an impact on the overall performance of the method in terms of efficiency and correctness.
Notably, most of the existing memristive systems are based on 2D arrays. As the units are only connected horizontally and vertically, such a 2D design sometimes cannot meet the complex topology of CNNs. Li et al. designed a 3D memristor circuit with a complex neural network shown in Fig. 6e [36], successfully extracted fine edge features using a 3D array and again obtained comparable results between software and hardware implementations of kernel operations (Fig. 6f). We found that despite the variability inherent in memristors, the actual processing results are comparable to software, while having pixel-level parallelism. The 3D array can be further expanded for parallel processing between different pixels, channels and filters over multiple convolutional layers. Compared to its 2D counterpart, this structure can conduct all computations in real-time and can be kernel vertically integrated directly to the 2D image sensor array, providing a significant speedup when running complex neural network models. This promises its application to cloud edge processors in IoT networks.
The randomness of ion transport in traditional oxide-based memristors introduces variability to the system, which makes it challenging for CNNs to operate in memristor arrays, affecting the learning accuracy. To overcome this challenge, researchers have developed new structural memristors such as 2D and 3D. Li et al. proposed a 2D heterostructure memristor array [37] due to the unique physical properties of 2D materials, 2D material memristors exhibit better scalability. The team confirmed that the nine memristors in the 3 × 3 cross-array (Fig. 7a) were able to achieve a uniform and consistent five-state map by adjusting the compliance current. The intensity of the original image was converted to voltage and input to the memory array for convolution calculation to extract the edges of the image, as shown in Fig. 7c, where in the hardware processing results are similar to the software processing. Figure 7b also shows other image processing results, such as Gaussian softening, sharpening, and embossing. These demonstrate the potential of CNN operating in a diaphragm array.
Convolutional image processing implemented using the PdSeOx/PdSe2 memristor crossbar array. a The whole process of convolution image processing using the memristor longitudinal array. b Results of image processing in five states implemented by adjusting the compliance current, map** the weights of [-4 to 4]. c Hardware and software processed vertical and horizontal edge extractions. The Prewitt kennels are for horizontal and vertical edge detections [37]
Image enhancement and restoration
Enhancement and fusion
Image enhancement processing is a major branch of digital image processing. Many images are usually captured with poor visual effects because of the environment and other conditions, which requires image enhancement techniques to improve human visual effects, for example, extracting characteristic parameters of target objects from digital images, highlighting certain features of target objects in images, etc., which are beneficial to the tracking, recognition and understanding of targets in images. Gradually, image enhancement technology has been involved in various aspects of human life and social production, such as the aerospace field, biomedical field, industrial production, public safety field and so on. To obtain good performance, some traditional image enhancement algorithms, such as the de-hashing algorithms of Tan and Oakley [117], Tan [118], He et al. [119], Tarel et al. [120], Nishino et al. [121], Meng et al. [122], and Sulami et al. [123], must pay the cost of a relatively heavy computational burden.
To avoid the maximum possible complex calculations generated by image enhancement, Zhu et al. [31] introduced memristor arrays into the image enhancement algorithm to subtly process images twice by the inherent properties of memristors. The algorithm uses a coarse transmission map and nonlinear memristor property with high efficiency, greatly reducing the computational cost, and the image quality evaluation reveals that it maintains a comparable performance with the classical algorithm (Fig. 8a). In addition, it was found that memristor-based image enhancement (MIEA) is more efficient than the classical algorithm in computing complexity and fine transmission map speeds. It takes 0.047 s on an Intel i7-9700 K CPU (14 nm), which is 90% less execution time than the 0.542 s in [119]. In the article [39, 48], the computational energy efficiency exceeds that of today's graphics processing units by two magnitudes. The presented processing fully exploits the memristor feature, but essentially the whole framework remains algorithmic. It is not considered to be a complete hardware-based digital image processing because the image pre-processing using memristors is only one step in the whole structure.
a MIEA overview: Red route: fine-tuning of the memristor crossover array using a rough image; Blue route: second fine-tuning of the architecture based on the original image. The final image was derived from current normalization [31]. b Structure of the device and 32 × 128 fabricated memristor array [32]. c Memristive array hardware system applied to image processing. (i) The origin DCT matrix; (ii) Array read current after processing by origin DCT matrix; (iii) the programming error matrix of (ii) [32]. d Specific image processing flow with Ag2S memristor arrays: encoding 3 × 3 convolution kernel values as array inputs, recording post-synaptic currents from the bottom electrodes after multiply-accumulate computation (MAC) operation as outputs, and map** them to image grayscale values [33]. e Group 1: Sharpening operation. (i) Result of software-based simulation, hardware outputs of filament-type memristor (FTM)(ii) and interface-type memristor (ITM)(iii). Group 2: Softening operation: (iv) Software simulation result, hardware outcomes of FTM (v) and ITM (vi) [33]. f The fusing structure of NSCT-based M-PCNN [34]. g Source image: left: CT image, right: MRT image [34]. h Comparison of M-PCNN (i) and PCNN (ii) fusion results. From left to right are the fused images, the difference between the fused image and the CT image, and the difference between the fused image and the MRT image [34]
Later, Zhang et al. [32] proposed an array-level enhancement method that uses flexible combinations of multiple arrays to handle different layers of varying accuracy importance. 4096 1T1R cells, arranged as 32 × 128, were fabricated, as shown in Fig. 8b. This memristor array demonstrates the multi-level characteristics of the measurement. The size of the discrete cosine variation matrix was matched to the size of the memristor array, each element in the matrix mapped to the array, the voltages were input to the rows and the output results were generated by accumulating the currents in each row. Comparing the original discrete matrix with the current matrix produces the programming error (Fig. 8c). They stored the transformation matrix in the array and the performance of the image processing is sensitive to changes, which suggests that the array-level enhancement method can reduce the programmed multi-level data changes.
Zhu et al. [33] applied Ag2S flexible memristors to digital image sharpening and blurring, subtly adopting the switching mechanism of the device's different interface resistors. The processing principle was that original image pixel values (from 0 to 255) were linearly mapped to read voltages (amplitude from 0 to 25.5 mV), two convolution kernel values were mapped into a cross-array (Fig. 8d) to modulate the conductance value of the device, and the structure shared a common bottom electrode so that two types of currents could be collected—the current generated by current summation and the output current generated by the multiplication of voltage and conductance. The grayscale image of the final processed result (Fig. 8e software (i and iv) and hardware processing results (ii, iii, v, and vi)) was generated by output currents. Apparently, after the sharpening operation, the outline of the "horse" was clearer, while the softening operation blurred the outline of the "horse" and its surroundings. However, this method inevitably applies convolutional operations in the processing, which increases the computational complexity and requires multiple iterations of training, consuming more space in the case of larger convolutional kernels.
Image fusion, another level of image enhancement, refers to the extraction of the image data about the same target acquired by multiple source channels through image processing and computer technology, and so forth, to maximize the favorable information in the respective channels and finally synthesize them into a high-quality image, reducing the uncertainty and redundancy of the output. The research of image fusion technology is on the rise, and the application fields are spread over remote-sensing image processing, visible light image processing, medical image processing, infrared image processing, and so on. Zhu et al. [34] fused CT and MRI images by using M-PCNN, whose structure is shown in Fig. 8f. The fusion process consists of the following four steps: first, to avoid the blending phenomenon of low-frequency subbands and thus overcome the pseudo-Gibbs phenomenon [124], the CT images and MRI were sampled by using the non-sampled contour transform (NSCT); second, stimulating M-PCNN neurons by utilizing the spatial frequency of NSCT transform domain coefficients. Again, the coefficients, with a high ignition number as the coefficients of the fused images. Finally, the new images were fused using the inverse NSCT algorithm after fusing the high and low frequency coefficients of the two images by the two-channel M-PCNN. The results of image fusion of the original image (Fig. 8g) using PCNN and M-PCNN are shown in Fig. 8h. The subjective visual comparison shows that M-PCNN for medical image fusion has a superior fusion performance, from which more detailed information can also be obtained.
Restoration
Similar to image enhancement, the restoration of images also targets at improving the overall quality of the image. Instead of image enhancement techniques, which focus on increasing the contrast and processing the image according to the receiver's preference, image restoration focus on removing the blurred parts of the image and repairing or reconstructing the degraded image, which can be considered as the reverse process of image degradation.
Considering that the noise generated during medical imaging degrades the image quality and blurs the observed tissue boundaries, which affects medical diagnosis, it is important to remove the noise while preserving the boundary and structural information. Zhu et al. [34] used the proposed M-PCNN structure for image denoising, where each neuron is connected to the corresponding pixel point and is also to the adjacent 3 × 3 neurons [125]. In most cases, the correlation between pixel values of noise and surrounding is weak and significantly different, and neurons mainly have two states: stimulated and unstimulated. The main principle of image denoising using M-PCNN is to judge and distinguish the noise, i.e., to judge whether each neuron and its neighboring neurons are stimulated or not. The obtained result adjusts the brightness of the corresponding pixel values for the purpose of noise reduction and image recovery. M-PCNN was proved to be a more suitable method for removing pretzel noise and retaining edge information, as seen through comparison (Fig. 9a).
a M-PCNN for medical image denoising. In order, the source CT images, images with salt and pepper noise, images processed by median filter denoising, images denoised by averaging filtering, and images denoised by M-PCNN [34]. b New LPF based on memristor bridge [126]. c Testing images and filter results. i) Standard clean images. ii) Image with white Gaussian noise. iii) Denoise by proposed adaptive Gaussian filter [126]. d Illustration of a neuromorphic visual system utilizing plasma photoelectric memristors for visual sensing, low-level image pre-processing, and high-level image processing (i.e., recognition) [35]. e Images comprising the ideal image (i), the real image with 10% random noise (ii) and the noise-reduced image (iii) after pre-processing. Image taken from the Yale Face Database B [127] [35]
Suppressing the high-frequency signal while passing the low-frequency signal is the main function of the low-pass filter (LPF), an important part of image denoising [128]. Yongbin et al. [126] designed a new LPF based on memristor bridge circuit described in [129], consisting of four identical memristors that can perform zero, positive and negative synaptic weightings (Fig. 9b). They discussed the memristor bridge-based LPF with its cutoff frequency varying over time and found a way to design a memristive Gaussian filter and its application to image processing inspired by the memristive filter and its typical characteristic. The principle is that the adaptive Gaussian kernel will change in different situations and the character of their LPF with variable cutoff frequency, which can be combined with the Gaussian filter to denoise the image. The researcher added Gaussian white noise with standard deviation σ = 10/255 to the 512 × 512-pixel image, and the Gaussian template size was set to 11 × 11. Then, the Gaussian filter proposed in the literature [130] and designed adaptive Gaussian filter were respectively used to denoise these noisy images, and the filtering results are shown in Fig. 9c. The designed filter circuit combined the variable parameters based on the memristor bridge with Gaussian filters, which provided a new idea for the image-filtering algorithm. However, the problem of the memristor being only a piece of the puzzle rather than the overall system architecture remains, and the purpose of hardware implementation is not completely reached.
Recently, the authors developed a plasmonic photomemory in Ag-TiO2 nanocomposite films that relies on optical excitation and the effects of localized surface plasmon resonance [35]. Such a device can integrate visual perception, low-level image pre-processing (including noise reduction and contrast enhancement), and high-level image processing functions (Fig. 9d). (Fig. 9d). They utilized an 80 × 80 phototransistor array to construct a neuromorphic vision system that comprised of two components. The preprocessed images were fed into an artificial neural network with two layers of nerves based on photomemristors to implement image learning and recognition (Fig. 9d-ii), and the low-level preprocessed images were obtained (Fig. 9e). The images show that the background noise was further smoothed after noise reduction.
Image recognition and classification
Image recognition, one of the mechanisms of computer vision, is based on the main features of an image and is a technique that analyzes the original image overall to reach a prediction of the category that it belongs to. In human image recognition, it is necessary to exclude redundant information from the input, extract the key information and integrate the information obtained at each stage to obtain a complete impression. The image recognition process is similar to it. One of the most important models for image recognition is the convolutional neural networks, but it has not yet been fully hardwared through memristor crossbars [131], an array of crossbars with memristor devices at each intersection. In addition, it is extremely challenging to achieve software-equivalent results because of its high variability, low device throughput, and other non-ideal characteristics [132,133,134,135].
Chu et al. [38] designed a visual pattern recognition neuromorphic hardware system (Fig. 10a), which consists of an artificial photoreceptor, a PCMO-based memristor array and CMOS neurons. Among them, an artificial photoreceptor converted images into input voltage pulses, memristor arrays [136] were used for synaptic connections, and leaky integrate-and-fire (I&F) neurons serve as output neurons. An improved spike-timing dependent plasticity algorithm was proposed for accordingly adjusting the memristor states or synaptic weights during system training. The system operated on the principle that when one of the output neurons integrated the current flowing through the memristor, it would reach a certain threshold earlier than the other neurons. An inhibitory signal from the discharged neuron would freeze all neurons and reset their internal state so that the recognition process could be restarted with the next test image. The system has been successfully trained and recognized digital images from 0 ~ 9, and random noise was added before applying the training images to the system for recognition, after which it was found that as the noise level increases, the recognition rate decreases correspondingly, for example, to 85% at 10% noise level. Adjusting the resistive state or synaptic weights of the memristors with algorithms during the training of the system, however, invariably increases the training time and cost. More attention needs to be paid to this point in the future implementation of memristor-based systems for hardware digital image processing as well.
a Neuromorphic system for visual pattern recognition [38]. b Architecture of the simulated memristor-based neural processing unit [39]. c Schematic illustration of the high-level in-sensor computing by employing the sensing memristor as a synapse to implement weight updating [40]. d Output image after 60,000 training epochs for the sensing memristor under different RH levels [40]
Then, Yao et al. [39] fabricated a memristor cross-array implementing CNNs that integrated eight 2048 one-transistor-one-memristor (1T1R) arrays and constructed a complete memristor-based five-layer CNN for MNIST image recognition [137], with an experimental correct recognition rate of 96.19% using this hybrid training scheme on the entire test dataset. In addition, the convolution kernel was replicated to three parallel memristor convolvers to reduce the mCNN latency by roughly one-third. In addition, the memristor-based CNN neuromorphic system (Fig. 10b) was shown to be is more energy efficient than state-of-the-art graphics processing units by over two orders of magnitude, and its highly integrated neuromorphic system provides a feasible solution. While improving CNN computational efficiency, it is expected to provide a plausible non-Von Neumann hardware solution for edge computing and deep neural networks, as well as being memristor-based. Given that present research on the neuromorphological aspects of memristors has mainly focused on single sensory processing such as vision, hearing, smell, and touch, whereas the human perceptual system can perceive and process diverse types of information simultaneously in complex environments at the same time. Wang et al. proposed an MXene-ZnO-based multimodal flexible sensing memristor that combines visual data sensing, relative humidity (RH) sensing, and pre-processing functions [40]. In the simulation, single-layer perception (SLP) consists of 28 × 28 neurons (785 input neurons), ten output neurons (ten categories, from 0 to 9), and fully connected 785 × 10 synaptic weights. Presynaptic neurons perceived inputs of 28 × 28 MNIST digits and transmitted them as synaptic forward potentials. The synaptic weights were modulated by humidity modulation and light modulation, to which the post-synaptic neurons then responded and performed the perceptual task (Fig. 10c). The researchers found that the artificial vision network trained at 60% RH captured more features from selected alphabets compared to low/high humidity (Fig. 10d) and could achieve high recognition accuracy after 60,000 training cycles. These results show that multimodal sensing memristors can be applied to both low and high-level sensor computing by reducing power consumption and chip area.