Introduction

The synergies between a tremendous amount of data, artificial intelligence (AI), and data science impact almost every aspect of industries and our lives1,2,3. This trend demands a significant amount of computer resources to afford the heavy workload of the machine learning algorithm and to handle the huge data volume. As a new computing paradigm, in-memory computing enables highly efficient data-intensive computation because energy is not consumed for data migration and access4,5,6. For example, the vector-matrix-multiply (VMM) calculation, one of the most frequent operations in machine learning algorithms, can be implemented by the memory-based multiply-accumulate (MAC) operation without the data migration/access steps. The need for a high-density, low-energy, and high-precision in-memory computing array is being driven by the ongoing expansion of data volume and complexity. This, in turn, leads to an increase in the number of input and output nodes of the memory array for in-memory computing. In-memory computing with artificial neural networks (ANN) must be developed using high-density integration technology and high-performance devices, as the size of the memory array is connected to the chip area, power consumption, and the accuracy of the MAC (multiply-accumulate) operation.

The simplest way to put more cells into the allowed area is the in-plane directional shrinking. However, as the minimum dimension limits the fabrication and performance of the device, the down-scaling technology migrates to the three-dimensional (3D) integration that exploits the vertical directional space by stacking tiers or cells instead of reducing the in-plane sizes7. Despite the intensive development of various vertical integration technologies, including the 3D stacked integrated circuits (3D-SICs) based on through-silicon-vias (TSVs) and monolithic 3D integrated circuits (M3D-ICs), several issues still hinder the further expansion to the vertical direction8,9,10,11,12,13,14,15. Firstly, the thermal budget for the top-layer devices is insufficient to achieve high-quality devices on back-end interconnects or even on front-end devices. Secondly, a severe increase in height as stacking requires elaboration in fabrication processes, such as anisotropic etching, flattening, alignment, and chip warpage control. In this light, thin van-der-Waals (vdW) materials are very attractive candidate materials for the implementation of 3D stacked devices16,17,18,19,20. The structure of the vdW materials is composed of atomic layers which have strong chemical bonding within the layer and weak interlayer bonding to each layer21. The atomic bonding structure of vdW materials facilitates the transfer of the grown film to the target substrate under low-temperature conditions22. As of now, the transfer technique is insufficient for industrial applications, but various transfer methods have been developed and some have succeeded in transferring a large-scale film grown by chemical vapor deposition (CVD) onto a target film23,24,25. The ultrathin thickness of vdW materials is also helpful to the 3D stacked structure because it reduces the total height and maintains surface morphology flat26, 27. Furthermore, the dangling-free surface affords the advantages of minimizing the roughness, traps, misfit strain, and defects in the interface28,29,30,31. In terms of device performance, the superior surface delivers the stable characteristics of memory devices and the immunity to mobility degradation in the ultrathin channel32,33.

Our proposal involves a two-tier stacked device structure using various vdW materials to enable efficient MAC operation in 3D in-memory computing. Notably, we have employed a specially designed ferroelectric field-effect transistor (FeFET) as the memory device for the neural network. In contrast to conventional FeFETs, the gate electrode of our device is positioned on the side of the ferroelectric layer. The lateral gate is made functional by leveraging the unique properties of α-In2Se3, namely, the interlocking feature between in-plane and out-of-plane polarizations, and the stable ferroelectric characteristics at room temperature in the ultrathin scale34,35,36,37,4i confirms the consistent current of the programmed (low conductance) and erased (high conductance) states. We briefly compared the features of LG-FeFET and HZO-based FeFET in Supplementary Table 2. The LG-FeFET exhibits a larger memory window (approximately 10 V) compared to HZO-based FeFETs (<5 V)61. The endurance of LG-FeFET is comparable with HZO FeFETs, but the retention is shorter than that of HZO FeFETs. Due to the unique semiconducting properties of α-In2Se3, the coercive field (Ec) and the remnant polarization (Pr) cannot be directly compared.

Fig. 4: Reliability characteristics of the LG-FeFET.
figure 4

a and b are the PFM images of the OOP and IP phase patterns, respectively, during the retention time. c The change of the OOP and IP phases in the inner square and outer region. The size of both areas is 1 × 1 μm. The portions of each polarization are calculated by counting the number of pixels for each polarity. d The number of pixels for each phase during the retention time. e The average values of each distribution and f standard deviation during the retention time. g Transient profiles of the phases across the inner square and outer region. h The dipole configuration and the net polarization charge at the boundary. i The drain current for the number of program-erase cycles.

Stacked LF-FeFET array for in-memory computing

The LG-FeFET has distinct benefits in reducing the vertical height and regulating weight values since there is no metal gate and a considerable memory window. To increase storage density and lower energy consumption, we suggest a 3D stacked design that can utilize the exceptional characteristics of the LG-FeFET. In Fig. 5a, we display the comprehensive schematic and cross-sectional images of the proposed 3D stacked structure. The unit cell of this 3D structure is made up of a selection transistor and a memory transistor. The selection transistor’s primary function is to dictate the behavior of the memory transistor through a combination of signals from the bit-line (BL) and word-line (WL). These signal lines connect to the drain and gate of the selection transistor, respectively. On the other hand, the memory transistor is responsible for carrying out the multiplication during the MAC operation. An input signal in the form of voltage is accepted through the input line (IL) and is then transmitted through the channel. As per Ohm’s law, this voltage is converted into a current and sent out to the common source line (CSL). The CSL and BL vertically connect the unit cells along the z-axis, while groups of cells are connected in parallel along the y-axis. A ‘MAC plane’ is defined as a group of cells that are connected by the same CSL on the y-z-plane, and this serves as the fundamental unit for performing MAC operations. The total number of MAC planes required is dependent on the number of output nodes. The corresponding equivalent circuit is shown in Fig. 5b. More detailed operation scheme of MAC planes is described in Supplementary Table 3. Input data is received by the MAC plane via the IL and a solitary output current is generated through the CSL (Fig. 5c). In order to demonstrate the feasibility of the 3D stacked memory structure utilizing LG-FeFETs, we carried out an experimental validation of MAC operation in two stacked LG-FeFET devices (Fig. 5c). As shown in Fig. 5d, two LG-FeFET devices are linked to the CSL line, from which the output currents come out. Different input voltages were applied to the 1F and 2F ILs of LG-FeFET devices, each with distinct weight values, resulting in different CSL currents. The currents correspond to the product of the input signal amplitudes and the respective weight values. Subsequently, these currents are accumulated at the CSL, following Kirchhoff’s law, as the sources of the two LG-FeFET devices are interconnected at a shared node. Refer to the two cases in Fig. 5d. The efficiency of the vertically stacked structure is contingent on the number of tiers. As the number of tiers is raised while kee** the density the same, less chip area is necessary, although the fabrication processes for vertical interconnection become more intricate at the same time62, 63. Fortunately, the 3D structure employing the LG-FeFET can alleviate the level of difficulty by relocating the gate electrode regardless of whether they are stacked in a sequential or alternative manner, which reduces the overall height. The reduction in area and total height is shown as a function of the number of stacks in Fig. 5e. Scaling the LG-FeFET vertically results in a reduction of the total height, which consequently leads to a decrease in both vertical directional resistance and energy consumption during a read operation. More details about the energy consumption and vertical resistance based on the number of tiers can be found in Supplementary Fig. S12. In comparison to a conventional vertical gate memory device, the LG-FeFET has exhibited a much larger memory window, which implies that the device’s channel conductance varies significantly over a wide range at a read gate voltage. Figure 5f illustrates the contrast in the conductance’s dynamic ranges between the vertical gate and lateral gate on the same device. To obtain the linear change of the conductance, the incremental step pulses with identical read voltage were applied to both vertical and lateral gates. The incremental step pulse program/erase (ISPP/ISPE) conditions were separately optimized for the vertical and lateral gates. For the ISPP condition, we set the start voltage at 2.3 V and the stop voltage at 4.0 V, with an increment of 13 mV. The ISPE condition, on the other hand, had a start voltage of −3.0 V and a stop voltage of −4.0 V, with an increment of 8 mV. Both pulse rates were maintained at 1 kHz. The states were verified at a gate voltage of 0.7 V after each program/erase pulse. The lateral gate of LG-FeFET exhibits a significantly larger dynamic range of conductance, measuring 55, which is 18 times larger than the vertical gate’s range of 3. The LG-FeFET is particularly advantageous in achieving more precise MAC (multiply-accumulate) operations due to its ability to handling a broad range of weight values. To evaluate the improved performance of LG-FeFET in system level, we conducted an image recognition simulation using a convolutional neural network (CNN) and the CIFAR-10 dataset64,65. As displayed in Fig. 5g, the lateral gate achieved an accuracy of 92.6%, whereas the vertical gate remained at 80.4%.

Fig. 5: A vertically stacked 3D in-memory computing array structure and its applications.
figure 5

a A vertically stacked 3D in-memory computing memory structure composed of LG-FeFETs. The top view and side views are displayed in the right panel. b The equivalent circuit diagram corresponding to the structure shown in a. c The OM image of the two-tier LG-FeFET device for the MAC operation. d The results of the MAC operation on the two-tier LG-FeFET device for the various input signals and weight values. e The reduction of the area and height as a function of the number of tiers at the same density. f The comparison of the dynamic range between the vertical and lateral gates. g The results of the image recognition simulation, which includes the dynamic ranges of the vertical and lateral gates, with a convolution neural network (CNN) simulation for the CIFAR-10 dataset.

Discussion

We have successfully developed a two-tier stacked ferroelectric memory that takes advantage of the distinctive material properties of α-In2Se3. The memory device is constructed using a laterally gated FeFET structure, wherein the gate is positioned on a section of the ferroelectric layer to apply an IP directional electric field. The primary operational mechanism of the LG-FeFET is based on the interlocking effect between the IP and OOP polarizations, which has been verified through PFM and KPFM measurements. Based on our experimental findings, we have confirmed that the IP directional electric field was 20 times more efficient in polarizing switching than the OOP directional electric field. As a result, the LG-FeFET demonstrated a broader memory window than the vertically gated FeFET. Additionally, we examined the retention characteristics of both IP and OOP polarizations simultaneously. Both polarizations maintained their polarized states for a duration exceeding 3 × 104 s. Nevertheless, we have noted that the two polarizations exhibited distinct changing behaviors during the retention period. We have observed that the IP polarization degraded from the boundary, while the OOP polarization degraded across the entire pattern. Using stacked LG-FeFETs, we have created an in-memory computing array and have successfully performed a MAC operation with a two-tier stacked memory. The wide dynamic range of LG-FeFETs provided enhanced accuracy for computing based on an ANN compared to the vertical gate structure.

Methods

Fabrication of the LG-FeFET

The substrate is a heavily p-doped silicon wafer with a thermally grown 90 nm thick SiO2 layer. The α-In2Se3 (2H), h-BN, and MoS2 flakes were mechanically exfoliated in turn from their bulk crystals using an adhesive tape (224SPV, Nitto). The α-In2Se3, h-BN, and MoS2 flakes were transferred onto a polydimethylsiloxane (PDMS) layer, and sequentially, transferred onto the SiO2 surface using a dry-transfer machine. The α-In2Se3 flake was partially covered with the h-BN flake and the channel material (MoS2) was placed on the overlapped region of α-In2Se3 and h-BN. Electron beam lithography (EBL) was used to define the gate, drain, and source regions. EBL photoresists, poly methyl methacrylate (PMMA) A4 and A6, were spin-coated at 3000 rpm for 60 s and baked at 180 °C for 120 s. The gate was defined on the opened region of the α-In2Se3 flake including the edge. After forming the PMMA pattern, Ti/Au (10 nm/80 nm) were deposited using an e-beam evaporator. The outside of the defined metal electrode regions was removed by the lift-off process. Before stacking another LG-FET device, a thick interlayer dielectric (ILD) was transferred onto the first device to separate the tiers electrically. The aforementioned LG-FET process was repeated to fabricate the second-tier device. To achieve consistency between the first and second tiers, we employed a process of selecting exfoliated flakes based on their color classification. Furthermore, we confirmed the thickness of these flakes using AFM measurements. This approach ensured uniformity in our samples.

Characterization of the materials and devices

The surface morphologies of the devices were inspected by AFM measurement using a non-contact cantilever with a high resonant frequency (PPP-NCHR, nanosensor) probe. The OOP and IP phases of the α-In2Se3 were inspected by PFM measurement using a conductive tip coated with Cr-PtIr5 (PPP-CONTSCPt, nanosensor). The analysis of the surface potential was conducted via KPFM measurement using a conductive tip coated with Cr-Au (NSC14/Cr-Au, Mikromasch). The AFM, KPFM, and PFM analyses were performed using an NX10 system (Park Systems Corp.). The current-voltage characteristics were investigated using Keysight B2912a, B2902a, and B1500a semiconductor parameter analyzers. All the measurements were conducted at room temperature in the air.