Introduction

The evolution of Internet-of-Things (IoTs) and edge devices in the areas of ubiquitous learning, sensing, and human-machine interaction is increasing dramatically1,2. These devices demand integrated intelligence in low power, small area, and computationally efficient hardware. Such computational systems consume significant power in the idle state, as they continuously process the incoming data. Thus, to reduce the power consumption significantly for such energy-constrained devices and applications where most of the computations are complex and require high energy, the wake-up system comes as a great advantage. Unlike the backend computationally complex recognition module, the wake-up system needs to be highly energy-efficient and accurately able to classify simpler tasks that can decide whether to turn on the main processing system or not. As shown in Fig. 1a, the wake-up system is an always on module and acts as a moderator between the real-world sensor unit and the main computational unit. It consists of a classification module that recognizes the ambient conditions and, once detected, it powers on the main processing system. Thus, the computational and power intensive module remains switched off and does not process the sensor data until ambient conditions are met. In the architecture of such wake-up devices, high accuracy, high energy-efficiency, and small area are key design requirements due to limited battery resources in edge computing devices. The wake-up system could be an activity detector such as motion detection of human, vibration detection for seismic monitoring systems, speech and non-speech recognition and others. Although there are few recent works in the areas of low power wake-up systems3,4, we present a novel trainable and biologically-inspired framework that utilizes memtransistors as analogue memories.

Figure 1
figure 1

CMOS-Memtransistor hybrid architecture framework of population coding. (a) Functional block diagram of the generalized system for activity detection using wake-up module. (b) Population coding architecture. Encoding weight matrices (we1 and we2) are random and fixed projections of input layer stimuli. Decoding weight matrix (wd) are adaptable weights that decode the population coding behavior. The output layer is simply decoded from an ensemble of hidden nodes. (c) Architecture design of the CMOS-memtransistor hybrid framework for population coding, consisting of three modules: the TAB architecture, synaptic memtransistor, and sign-based online update learning (SOUL).

We hereby propose a biologically inspired wake-up sytem with embedded intelligence and efficient energy footprint, that can be integrated with existing edge computing devices to improve their energy efficiency. The proposed framework utilizes the population coding scheme where encoding of information is carried out by the activity in an ensemble of neurons such as in the olfactory, motor, and visual cortex5,6,7,8.

The system utilizes a three terminal architecture with atomically thin molybdenum disulphide (MoS2) as an active channel to host analogue memory. Such a gate driven memtransistor architecture differs from conventional two terminal memristor device and allows operation at very low power. In this work, the term memtransistor is used to define a memory device operating in the transistor geometry. This should not be confused with a similar terminology utilized in9. It simultaneously exploits device mismatches to implement randomness in the population coding scheme. The utilization of memtransistors in neuromorphic circuits offers a promising realization of synapses, variable weight storage, and many other applications10,11,12,13. Neuronal non-linearity and random weights are designed using CMOS 65 nm (Single-Input Single-Output, SISO)14,15 and 130 nm (Multiple-Input Single-Output, MISO)16 technology nodes. These chips use random device mismatches present in the lower technology nodes to implement fixed random weights in the architecture. Moreover, neuronal non-linearity in these chips can be tuned externally to make the system more heterogenous using systematic offset14,15,16.

In this framework, we utilized our fabricated MoS2 synaptic memtransistor’s characterstic measurment data for implementing analogue memory as the memtransistor’s memductance (conductance of memtransistor). Layered semiconducting transition metals dichalcogenides (TMDCs), including MoS2, MoSe2, WS2, WSe2 and group III-VI semiconductors such as GaSe are known to demonstrate non-volatile memory behavior in a two-terminal memristor or theree-terminal transistor geometry17,18,19,20,21,22,23. This is attributed to the transport gap in their electronic band structure which leads to a large variation in the channel resistance under the influence of a gate or drain bias. The high (program) and low (erase) resistance states can be utilized for the storage of information in memory applications. However, most of the reported non-volatile memory devices based on MoS2 typically utilize a large drain bias, which leads to substantial power dissipation. In order to overcome this shortcoming, we have implemented an extended floating gate (FG) geometry in the current device. This is done by lithographically connecting the graphene underlayer to a floating gold electrode which enhances the effective coupling between the Si++ control gate (global back gate) and the MoS2 channel. This enhanced coupling is responsible for the observed improvement in the device performance markers such as subthreshold swing of 77–80 mV/decade24 (Supplementary Fig. 2) and reduces voltage requirements for analogue memory action. Previously reported MoS2-based synaptic memtransistors utilized bias-induced motion of defect states in chemical vapour deposition (CVD) grown thin films to demonstrate the hysterisis effect9. However, in the current device, we utilize an electric field driven out-of-plane charge transfer between the channel and the FG to demonstrate pulsed multi-state memory behavior, similar to a biological synapse. The trilayer device used for this purpose comprises of an exfoliated single layer MoS2 channel, hexagonal boron nitride (hBN) tunnel barrier, and graphite floating gate. It utilizes floating gate memories, which involves the tunneling of charge carriers from the channel through a tunnel barrier into the floating gate25,26,27. The device is capable of emulating synaptic plasticity while maintaining energy dissipation figures below 0.3 pJ for long-term potentiation (LTP) and 20 pJ for long-term depression (LTD).

Using such hybrid framework that utilizes analogue subthreshold circuits for computation, along with memtransistive device as a multistate analogue memory, not only saves power (both in the designed circuit as well as power consumption by the memtransistive device) but also improves computational efficiency. Hence, the synaptic memtransistor memory can provide two functions simultaneously, one is a substitute for digital memories as an adaptable multi-state memductance and the other is the execution of an inherent multiplication operation by Kirchhoff’s current law (KCL). We tested our proposed framework using both offline and simplified sign-based online learning techniques28 for classification as well as regression tasks. Simulation and testing of the proposed framework was done using fabricated chip data and fabricated synaptic memtransistor’s characteristic measurement data. We believe that this hybrid architecture paves the way to achieve a low power computing paradigm that is robust to variability and is a fault-tolerant design.

Hybrid architecture framework

The wake-up system architecture based on the population coding scheme is shown in Fig. 1b. There is an all-to-all connectivity between the input and the first hidden/encoding layer, and sparse connectivity between other layers. This sparsity and combination of two hidden layers provides better randomness for feature expansion of input stimuli into higher dimensional space, and hence improves the representational capacity of the network29,30. The input stimuli are encoded using fixed and random weights for each hidden layer neuron. The weights of the second hidden layer to the output layer are learnt for the given regression or classfication tasks and are calculated by minimizing error using the least square method (LSM). The outputs are determined by the ensemble of hidden layer neurons. Figure 1c shows the architecture design of the proposed CMOS-Memtransistor hybrid framework utilizing the population coding scheme. In this framework, three components are incorporated, namely the trainable analogue block (TAB)14,15,16, synaptic memtransistor device, and sign-based online update learning (SOUL)28. The TAB architecture uses random device mismatches between transistors for random and fixed weighted summation of input stimuli, and further adds non-linearity to each hidden neuron. The memtransistor is used as an in-memory computing device, which stores trainable weights as multi-state analogue values and perform multiplication operations as well. The SOUL algorithm, a hardware-efficient version of the online update rule, is used to update the values of memductance based on the correlation between the sign of the output error signal and the sign of the hidden layer neurons. The detailed architectures of these components are discussed in subsequent sections. A combination of these components along with tunable hyper-parameters (threshold error and gain control) shows the potential of achieving robust, fault-tolerant, low power, and smaller area systems.

Trainable analogue block (TAB)

In the wake-up architecture, the TAB uses device mismatch as a means for random projections of the input to a higher dimensional feature space. The first prototype of the TAB chip for single input (SISO)14 with 456 hidden neurons was fabricated using 65 nm technology node, and then a generalized form of the TAB framework for multiple inputs (MISO)16 with 100 hidden neurons was built using 130 nm technology node. Learning capabilities of the chips were demonstrated for both regression and classification tasks.

Figure 2 shows the schematic of a hidden neuron building block in the SISO and MISO TAB designs. Figure 2a represents an operational transconductance amplifier (OTA), with V1 and V2 as differential inputs, and Vb as bias voltage to set the bias current, Ib. The current in transistors M1 and M2 and the output current of OTA, Iout are described31 in Eqs (13). Here, UT is thermal voltage, and η is the slope factor32, which ranges from 1.1 to 1.5 in the weak inversion region. In case of multiple inputs (MISO), weighted input summation for each hidden neuron is performed using the weighted average block (WAB), as shown in Fig. 2b and the effective output, Vout is described in Eq. 4. Figure 2c represents the schematic of the neuronal non-linearity block, which is cascaded after the WAB for each hidden neuron. Here, Vin is connected to Vout of the WAB. Due to process variations, random device mismatches in the differential pair and transconductance amplifier lead to random weights and different non-linear activation functions. Further, randomness can be incorporated by applying different Vref and Vb to different hidden neurons. In Fig. 2c, Itanh is the output current of the hidden neuron, and the other output, signH is required for the SOUL algorithm in the online update of weights (here memductance). In case of single input (SISO), the WAB is not required and the input can be directly connected to the neuronal non-linearity block for each hidden neuron. Figure 2d,e represents the neuronal tuning curves for SISO and MISO TAB architecture. It shows the variation in offset and current amplitude by varying reference and bias voltages.

$${{\rm{I}}}_{1}={{\rm{I}}}_{{\rm{b}}}[\exp \,(\frac{{{\rm{V}}}_{1}}{\eta {{\rm{U}}}_{{\rm{T}}}})]/[\exp \,(\frac{{{\rm{V}}}_{1}}{\eta {{\rm{U}}}_{{\rm{T}}}})+\exp \,(\frac{{{\rm{V}}}_{2}}{\eta {{\rm{U}}}_{{\rm{T}}}})]$$
(1)
$${{\rm{I}}}_{2}={{\rm{I}}}_{b}[\exp \,(\frac{{{\rm{V}}}_{2}}{\eta {{\rm{U}}}_{{\rm{T}}}})]/[\exp \,(\frac{{{\rm{V}}}_{1}}{\eta {{\rm{U}}}_{{\rm{T}}}})+\exp \,(\frac{{{\rm{V}}}_{2}}{\eta {{\rm{U}}}_{{\rm{T}}}})]$$
(2)
$${{\rm{I}}}_{{\rm{out}}}={{\rm{I}}}_{{\rm{b}}}\,\tanh \,\frac{{{\rm{V}}}_{1}-{{\rm{V}}}_{2}}{2\eta {{\rm{U}}}_{{\rm{T}}}}={{\rm{g}}}_{{\rm{m}}}({{\rm{V}}}_{1}-{{\rm{V}}}_{2}){{\rm{g}}}_{{\rm{m}}}=\frac{{{\rm{I}}}_{{\rm{b}}}}{2\eta {{\rm{U}}}_{{\rm{T}}}}$$
(3)
$${{\rm{V}}}_{{\rm{out}}}=\frac{{\sum }_{{\rm{i}}}^{{\rm{N}}}{{\rm{g}}}_{{\rm{mi}}}{{\rm{V}}}_{{\rm{i}}}}{{\sum }_{{\rm{i}}}^{{\rm{N}}}{{\rm{g}}}_{{\rm{mi}}}}$$
(4)
Figure 2
figure 2

Hidden layer neuron architecture of the TAB SISO & MISO systems. (a) Operational transconductance amplifier (OTA), i.e., the building block in the MISO TAB16. (b) Weighted average block (WAB). Schematic showing the weighted average circuit in case of multiple inputs. (c) Neuronal non-linearity block. Schematic to concatenate non-linearity to the output of the WAB16. (d) Tuning curves of 20 random neurons out of 456 neurons from the SISO TAB, when both bias and reference voltages of all neurons are varied. (e) Tuning curves of 100 neurons for two input (MISO).

Synaptic memtransistor

After the encoding scheme is implemented using the TAB, the ensemble of neurons are used to decode the population coding scheme using trainable weight blocks. Here, MoS2-based ultra-low power two-dimensional synaptic memtransistors are used to implement trainable weight blocks where weights are stored as the memductance of memtransistors. These weights are updated to reduce the mean square error. For the synaptic memtransistor, we found the hysteretic switching at near-ideal sub-threshold swing of 80 mV/decade in the fabricated device, shown in Fig. 3. This hysteresis is caused by charge tunneling through hBN, and is used to emulate synaptic plasticity at energy dissipation below 0.3 pJ for long term potentiation (LTP) and 20 pJ long term depression (LTD).

Figure 3
figure 3

Plots showing memtransistor characteristics. (a) Memductance (Msd) plot for consecutive positive pulse input of 100 ms pulse width and 500 ms pulse separation for Vsd of 50 mV. (b) Memductance (Msd) plot for consecutive negative pulse input of 100 ms pulse width and 500 ms pulse separation for Vsd of 50 mV. (c) Output current plot for both negative and positive input voltage pulses. (d) Sweep rate dependence hysteresis plot showing transfer characteristics of a typical device performed at different swee** rates of back gate voltage (Vg). A negligible change in the hysteresis window size with sweep rate indicates the absence of slow defect-based charge trap** processes in the MoS2 floating gate devices. (e) Time series data of drain current (Isd) for potentiation (negative) and depression (positive) pulses. The absolute and percentage change in drain current are shown in the respective sections. We used a pulse of amplitude −4 V for potentiation and +3 V for depression. The pulse width was 100 ms in both the cases. (f) Optical micrograph of a tri-layer heterostructure of MoS2/hBN/graphite transferred on a Si++/SiO2 (285 nm) substrate and (Inset) the final device after electron beam lithography and metallization. (g) Schematic representation of the device structure.

Figure 3a,b show the variation in memductance ‘Msd’ for positive and negative pulses applied, respectively, with a pulse interval on 500 ms. Figure 3c shows the characteristic plot for the output current versus input voltage (pulse) obtained for negative and positive pulse intervals. It shows that on applying negative pulses, the memductance increases and so does the output current. Figure 3d shows the sweep rate dependence of hysteresis transfer characteristics, performed at different swee** rates of back gate voltage (Vg). The negligible change in the hysteresis window size with sweep rate indicates the absence of slow defect-based charge trap** processes in the MoS2 floating gate devices. Furthermore, the plasticity of vertical charge transfer in the memtransistor allows non-volatile conductance change under pulsed gate operation like that in biological synapses, where excitation and inhibition of pre-synaptic pulse increases or reduces the conductance of the synapse, respectively. A detailed investigation on the retentivity, robustness, endurance and switching variability of the various conductance states in the MoS2 FG synaptic devices is provided in our previous communication24. Here, the gate acts as the pre-synaptic terminal and controls the conductance of the MoS2 channel/synapse using a sequence of pulses. The increase in conductance (potentiation) and the decrease in conductance (depression) are performed by applying short time period, voltage pulses at the gate terminal, while simultaneously tracking the change in the drain current. The channel conductance increases continuously for every excitatory pulse, as shown in Fig. 3e, following an approximately linear pattern and decreases on application of an inhibitory pulse. Figure 3f,g shows the optical micrograph and schematic representation of the fabricated device, respectively. A detailed implementation is mentioned in the Methods section.

Sign-based online update learning (SOUL)

A simple and hardware-friendly learning algorithm, SOUL was used to update the values of memductance. The SOUL3a,b. The continuous tunneling of charge per pulse leads to a cumulative increase in the screening electric field, which manifests itself in a linear increase or decrease in channel conductance depending on the type (positive or negative) of charge tunneling into the floating gate.

Electrical measurements for potentiation and depression of channel conductance

Potentiation and depression gate voltage pulses were applied using a synthesized function generator DS 345 from Stanford Research Systems. The drain voltage was supplied using a Lock-in amplifier SR 830 (Stanford Research Systems) (226.7 Hz sinusoidal wave), while the current at the source terminal was measured using the internal DAC of this Lock-in.