1 Introduction

In the era of big data, artificial intelligence (AI) has made breakthroughs in the application of facial recognition, driverless driving, intelligent robots and other fields. At present, the implementation of AI is mainly based on algorithms, which require chips with large computational power for data processing. The computational power of the chip and the development of AI are complementary to each other [1,2,3,4,5]. The separation of the central processing unit (CPU) and memory in traditional von Neumann architecture causes latency and energy consumption during data transfer (Fig. 1a) [6, 7]. Furthermore, although CPU performance (ns level processing) has been greatly improved with the development of integrated circuit technology, the low access speed (μs level) of memory leads to severe time consumption and limits the whole performance [8,9,10,11]. In order to break through these bottlenecks, NVIDIA's multi-core graphic processing unit (GPU) and Google's tensor processing unit (TPU) with a processing near memory architecture, and in-memory computing (IMC) technology based on nonvolatile memory (NVM) have emerged in recent years (Fig. 1b) [12, 13]. In-memory computing within artificial neural networks enables highly efficient data-intensive computation due to the elimination of data migration and access. The vector–matrix multiplication (VMM) is a key operation in artificial neural networks. The crossbar array constructed with NVMs can perform VMM operation in one step following circuit laws [14]. The programmable conductance matrix is multiplied by the inputing voltage vector applied at the input wordlines in parallel to obtain current based on Ohm’s law, and the accumulated current at each bitline obeys Kirchhoff’s current law. Thanks to the science and technology advancement, a large number of NVMs emerge, including NAND flash, resistive random-access memory (RRAM), magneto-resistive RAM (MRAM), phase change RAM (PCRAM) and ferroelectric memory (FeM) [15,16,17,18]. Among them, FeM devices have unique superiorities with respect to power consumption, operation speed and endurance (Fig. 1c). For example, ferroelectric RAM (FeRAM) has faster read/write speeds and better endurance than other RAMs, and the read/write speed and endurance of ferroelectric field effect transistors (FeFETs) are also better than commercially available NAND flash. However, as will be discussed below, there is still space for the cell size of FeMs to shrink, thereby facilitating higher integration density.

Fig. 1
figure 1

a Memory and CPU in von Neumann architecture. b The technical roadmap to improving computing efficiency. c The performance comparison of existing NVM. Here, “FeRAM:10/10” means that the read/write time of FeRAM is 10/10 ns, and the remaining definitions follow similar rules. Data are obtained from Ref. [19,20,21,22,23,24]

Ferroelectric materials have spontaneous polarization that is switchable by electric field. Notably, multiple stable polarization states can be configurated by precisely controlling the parameters of electric field (e.g., amplitude, frequency and duration) [25,26,27,28]. It should be noted that the ferroelectric polarization states are regulated by the electric field, which avoids joule heating caused by current and significantly reduces energy consumption. The fast speed and low energy cost of polarization switching allow a high computational power of FeMs. For instance, the ferroelectric tunnel junction (FTJ) array has been reported to reach 100 tera-operations per second per watt [29]. An ideal FeM demands the involved ferroelectric materials to possess the following characteristics: (1) Good CMOS compatibility [30,31,47,48,49], SrBi2Ta2O9 [50], BiFeO3 [51,52,53], KxNa1−xNbO3 [54, 55], HfxZr1−xO2 [56,57,58], AlScN [59,60,61,62], PVDF [63,64,65,66], molecular ferroelectric [67, 68]