Introduction

Methamphetamine (MA) is a neurological drug, which can cause serious mental symptoms, such as hallucinations and delusions. Light MA addiction can cause anxiety and other emotions, while severe MA addiction can cause depression and suicidal tendency (Zweben et al. 2004). Besides, some MA abusers suffer from psychosis and schizophrenia (Liu et al. 2017). Unfortunately, even though the drug is known to be too harmful to humans, little can be done about its addiction (Mooney et al. 2014). More unfortunately, unlike heroin and cocaine, MA formulas are available via the Internet and produced using common daily necessities (Lineberry and Bostwick 2006), which has accelerated its spread.

Electroencephalography (EEG) signals reflect the electrophysiological activity of nerve cells in the brain. In clinical practice, EEG signals have been shown to include a large number of physiological and pathological information (Zhong 2002). Such information can often give more details on a person?s physical condition, which has profound significance for both the prevention and remedy of diseases. In 1932, Dietch first used Fourier transform to analyze the collected EEG signals (Dietsch 2002). Since then, many analytical methods have been applied to study EEG signals, including time-domain and frequency-domain analysis, wavelet transform, artificial neural network (ANN) analysis and nonlinear dynamics analysis (Yousefi et al. 2022; Rafik and Ilyes 2023).

MA primarily affects the central nervous system. Therefore, consuming MA would lead to abnormal EEG signals of drug addicts (Prabhat et al. 2022; Gege et al. 2023). In this study, we attempted to record the EEG signals of MA abusers and analyze the differences in these signals between MA abusers and normal persons, which would support the exploration of the principle of action and treatment methods of MA.

As an important indicator of EEG signals, P300 is related to selective attention, memory renewal, motivation, stimulation significance and the activation of inhibition process (Turnip et al. 2013). Up to now, many studies have intensively investigated the detection of MA abuse using P300 (**xiang et al. 2013; Zhong et al. 2020; xxx yyy; Huang et al. 2023). MA abusers were asked to perform the same task as normal subjects, and it was shown that the P300 component can be successfully used to differentiate MA abusers from healthy subjects (Haifeng et al. 2015; Shuguang et al. 2018). With the advancements in neuroscience, researchers have developed a variety of feature extraction approaches of EEG signals to predict human psychiatric disorders (Shahmohammadi et al. 2016; Ahmadlou et al. 2013). However, little work has been so far devoted to extract the EEG P300 features of MA abusers to set up an automatic classification system and detect MA abusers.

EEG signals are non-stationary random signals (Shin et al. 2015), thus, conventional Fourier transform can only be used to see which frequency domains are EEG signals composed of, while the corresponding time information of each frequency component cannot be obtained. This means that there will be multiple identical time-domain graphs corresponding to one frequency-domain graph only. Hence, it is necessary to mention the wavelet transform, which can be used to decompose signals of different resolutions (Zhang 2020). By scaling and translation of the basis function, the window becomes narrower at high frequencies and wider at the low frequency, so that accurate frequency and time information can be obtained. Due to the superiority of the wavelet in signal transformation, wavelet transform is known as the “digital microscope” (Yu-Sheng et al. 2013). In view of the advantage of the wavelet transform, it was used as the extractor of time-frequency features of P300 signals to obtain wavelet coefficients as the main features (** et al. 2019).

Classical machine learning algorithms are commonly used methods for neurophysiological signal analysis and pattern recognition. The hypothesis in this study is that P300 signals in cognitive processes can demonstrate significant differences between MA abusers and normal persons. By extracting P300 features, the machine learning algorithm is able to recognize MA abusers.

The main contributions of this work are listed as follows:

  1. (1)

    For the first time, we used the features of the P300 component of EEG signals to detect MA abusers. Compared with other methods, the P300 component shows more differences between healthy people and MA abusers, and it is simpler to extract its features.

  2. (2)

    MA addiction could be detected through a few stimulation experiments, and the proposed model improved the detection efficiency.

  3. (3)

    Compared with biochemical detection methods, the detection results of our model were more reliable.

The remaining of this article is organized as follows. Chapter 1 mainly includes a short introduction to the research overview and background. In chapter 2, we present related work of EEG-based MA classification. Chapter 3 describes the experimental methods and process of data analysis. Chapter 4 presents the final results and related discussion, and a conclusion is presented in chapter 5.

Related Work

EEG-Based Research of Methamphetamine

MA can lead to substance use disorder (SUD), directly affect the central nervous system and impose a gigantic burden for the human body. The influence of MA on the brain can be mapped through the brain activity. EEG can reflect the activation process of the brain under various stimulation. It can enlarge the tiny biological electricity to a curvilinear record graph, recording the neural activity in ms. Its excellent temporal resolution ensures more precise analysis of neural activity.

Many studies have extensively investigated the effects of methamphetamine using EEG signals (Di et al. 1.

Fig. 1
figure 1

Schematic illustration of the behavioral procedure and stimulus examples in a trial

Data Recording and Preprocessing

Continuous EEG was recorded using the Brain Vision Recorder 2.0 system (Brain Products Company, Munich, Germany). In the process of recording, FCz was regarded as the reference electrode, and the ground electrode was AFz. An electrode placed approximately 2 cm below the right eye and centered under the pupil was used to record vertical electrooculogram (EOG). The recorded EEG signals were amplified and digitized at a sampling rate of 1000 Hz in the DC acquisition mode, and the electrode impedance was required below 10 k\(\Omega\).

The data were processed offline after recording. The operation of processing was executed by EEGLAB and ERPLAB toolboxes based on the MATLAB Platform. Data were re-referenced to a mastoid electrode averaged reference, down_sampled to 250 Hz, filtered by a 30 Hz low-pass filter. Artifacts, such as including spikes, EEG drift and abiotic signals were manually removed. Then the eye electrical interference was eliminated through independent component analysis (ICA). The range of each epoch was \(-200\) to 1000 ms, and we used 200 ms before stimulus presentation as baseline correction. The event-related potential (ERP) of each channel under three stimulations after preprocessing is shown in Fig. 2. As shown in Fig. 2, there was a big difference in the ERP between two groups, especially in the P300 under S2 stimulation. Therefore, only the epochs of S2 were extracted, and every five epochs of each subject were overlapped and averaged. Finally, 144 groups of data were obtained in the addiction and healthy groups, respectively.

Fig. 2
figure 2

ERP waveform of each channel under addiction stimulation

Discrete Wavelet Transforms (DWT) for Signal Analysis

Most biological signals in nature are non-stationary random signals; hence, wavelet transform is usually used in biomedical signal field. The time domain features, frequency domain features and wavelet coefficients of P300 stimulated by the S2?signal were abstracted for classification in this work.

The wavelet transform is operated by two basic functions: scale function \(\phi (t)\) and wavelet function \(\psi (t)\), which are the prototype forms of the following class of orthonormal basis functions, respectively:

$$\begin{aligned}{} & {} {\phi _{j,k}}(t) = {2^{j/2}}\phi ({2^j}t - k);j,k \in Z \end{aligned}$$
(1)
$$\begin{aligned}{} & {} {\psi _{j,k}}(t) = {2^{j/2}}\psi ({2^j}t - k);j,k \in Z \end{aligned}$$
(2)

where k controls the translation of the wavelet base in the time domain, j denotes the parameter in the frequency domain, which determines the frequency features of the wavelet base, and Z is a set of integers.

The complete wavelet expansion f(t) is defined by the wavelet function and the scale function, as follows:

$$\begin{aligned} f(t) = \sum \limits _{{j_0} \in Z}^\infty {c({j_0},k)} {\phi _{{j_0},k}}(t) + \sum \limits _{j > {j_0}} {\sum \limits _{k = 0} {d(j,k)} } {\psi _{j,k}}(t) \end{aligned}$$
(3)

where the coefficients \(c(j_0,k)\) and d(jk) are calculated by inner product as follows:

$$\begin{aligned}{} & {} c({j_0},k) = \left\langle {f(t),{\phi _{{j_0},k}}(t)} \right\rangle \end{aligned}$$
(4)
$$\begin{aligned}{} & {} d(j,k) = \left\langle {f(t),{\psi _{j,k}}(t)} \right\rangle \end{aligned}$$
(5)

This is the final and core form of the wavelet transform (Gandh et al. 2010). In this study, the quadratic B-spline function (Gan et al. 2017) was selected as the mother wavelet. The N-order approximate coefficients can be calculated by wavelet decomposition. Given that the samples are 250 Hz, a fifth-order wavelet decomposition is performed. In the wavelet transform multi-resolution algorithm, low-pass (LP) and high-pass (HP) filters use the same wavelet coefficients. The coefficients with LP filter are related to the scaling function. Its outputs are called the approximate quantity (A). While the HP filter have a connection with the wavelet function, and its outputs are called detail quantity (D).

Feature Extraction

The frequency range of P300 has been confirmed to be in the delta band (Gao et al. 2010). Therefore, using the method of wavelet decomposition, we calculated the wavelet coefficients of the delta band as the wavelet features. Furthermore, the time and frequency domain features of all signals X(t) have also previously been added to classify the two types of signals (Gao et al. 2014). Several time domain features were calculated as follows:

  1. (1)

    Maximum amplitude (MAA): the maximum amplitude of X(t), calculated as follows:

    $$\begin{aligned} MAA = \max \{ X(t)\} \end{aligned}$$
    (6)
  2. (2)

    Minimum amplitude (MIA): the minimum amplitude of X(t), calculated as follows:

    $$\begin{aligned} MIA = \min \{ X(t)\} \end{aligned}$$
    (7)
  3. (3)

    Latency (LAT): the time at which MAA of X(t) occurs, calculated as follows:

    $$\begin{aligned} LAT = \{ X(LAT) = MAA\} \end{aligned}$$
    (8)
  4. (4)

    Ratio latency to maximum (RLM): the ratio latency to maximum amplitude of X(t), calculated as follows:

    $$\begin{aligned} RLM = \frac{{MAA}}{{LAT}} \end{aligned}$$
    (9)
  5. (5)

    Positive area (PA): the sum of the positive signal values of X(t), calculated as follows:

    $$\begin{aligned} PA = \sum \limits _{t1}^{t2} {\frac{{X(t) + \left| {X(t)} \right| }}{2}} \end{aligned}$$
    (10)

    where t1 and t2 denote the initial time value and cut-off time value of P300, respectively.

  6. (6)

    Difference between positive and negative amplitude (DPN), calculated as follows:

    $$\begin{aligned} DPN = MAA - MIA \end{aligned}$$
    (11)

Let Y(f) be the power spectral density of X(t). Then, the following calculation methods can be used to extract the frequency domain features:

  1. (1)

    Maximum frequency (MF): the maximum frequency of X(t), calculated as follows:

    $$\begin{aligned} MF = \{ Y(MF) = Y{(f)_{\max }}\} \end{aligned}$$
    (12)
  2. (2)

    Average frequency (AF): calculated by a frequency weighted average, where the weighted coefficient is the value of Y(f). It can be calculated as follows:

    $$\begin{aligned} AF = \frac{{\int _0^{125} {f \times Y(f)df} }}{{\int _0^{125} {Y(f)df} }} \end{aligned}$$
    (13)

A total of 31 features under each channel were involved in the classification, and an EEG sample with features from 62 channels was a 1922-dimensional vector. The entire feature extraction mechanism is summarized in Fig. 3, and the pseudo-code of the algorithm with feature extractor is shown in Algorithm 1.

However, a lot of repetitive or redundant information exist in EEG signals (Amin et al. 2015), so there is a strong correlation between the features, which can weaken the generalization ability of the model and reduce the classification accuracy. Thus, the F_score is still needed to select which features would have better classification effect.

Fig. 3
figure 3

The block diagram for the entire processing pipeline of feature extraction

Algorithm 1
figure a

Design of the feature extractor

F_score for Feature Selection

F_score can measure the discrimination of two sets of features (**e et al. 2010). Let \({x_k} \in {R^m},k = 1,2,\ldots ,n\), be the given recording set, and the number of positive and negative samples be \({n_ + }\) and \({n_ -}\), respectively. Then, the F_score value of the ith feature of the training sample can be defined as follows:

$$\begin{aligned} F(i) = \frac{{{{(\overline{{x_i}^{( + )}} - \overline{{x_i}} )}^2} + {{(\overline{{x_i}^{( - )}} - \overline{{x_i}} )}^2}}}{{\frac{1}{{{n_ + } - 1}}\sum \limits _{k = 1}^{{n_ + }} {{{({x_{k,j}}^{( + )} - \overline{{x_i}^{( + )}} )}^2} + \frac{1}{{{n_ - } - 1}}\sum \limits _{k = 1}^{{n_ - }} {{{({x_{k,j}}^{( - )} - \overline{{x_i}^{( - )}} )}^2}} } }} \end{aligned}$$
(14)

where \({{x_{k,j}}^{( + )}}\) and \({{x_{k,j}}^{( - )}}\) are the kth positive and negative sample points of the ith feature, respectively. \({\overline{{x_i}} }\), \({\overline{{x_i}^{( + )}} }\) and \({\overline{{x_i}^{( - )}} }\) represent the average of the whole, positive and negative data sets, respectively. The degree of the discrimination of the feature is decided by the value of the F_score (Chen and Lin 2006).

In this study, F_score was calculated for all features, and we removed the features with a score below the average score. All features will be put into the feature set in descending order to form the final feature set. The feature selection algorithm is shown in Algorithm 2.

Algorithm 2
figure b

Feature selection

Classification

LSTM has shown a good effect in a variety of applications in image recognition, text analysis, disease prediction and so on (Zhang et al. 2019). LSTM is an improved and upgraded RNN. It adds weight control over different moments of memory through a gate controller. The memory unit structure of LSTM is shown in Fig. 4.

Fig. 4
figure 4

The memory unit structure of BiLSTM

c represents a memory state, which is the most important part of the entire memory unit (Houdt et al. 2020). As shown in Fig. 4, it is directly transferred on the entire structural chain, using only a small number of linear operations, hence the information is substantially unchanged during transmission. Meanwhile, the memory unit contains three intelligent “gate” structures to control the flow of information, namely forget gate input gate and output gate. According to the choice of the gate, the information contained in the memory status can be added or deleted. It includes a point-by-point multiplication operation of vectors and a sigmoid function, mainly with the following parts:

\({C_{t - 1}}\) represents the memory state at time \(t-1\), which records historical information of all time steps. It belongs to the long-term memory of the model.

\({h_{t - 1}}\)represents the output at time \(t-1\), which mainly records the time step information, thus belonging to the short-term memory of the model.

Forget gate \(f_t\), input gate \(i_t\) and output gate \(o_t\), are between 0 and 1. The memory state C of the jth memory unit at time t is the operated result of the input gate \(i_t^j\), forget gate \(f_t^j\) and previous memory state \(C_{t-1}^j\) (Zaremba et al. 2014). The memory status C at time t is defined as:

$$\begin{aligned} C_t^j = f_t^j \times C_{t - 1}^j + i_t^j \times {\tilde{C}}_t^j \end{aligned}$$
(15)

such that

$$\begin{aligned} \left\{ \begin{array}{l} {f_t} = \sigma ({W_f}[{h_{t - 1}},{x_t}] + {b_f})\\ {i_t} = \sigma ({W_i}[{h_{t - 1}},{x_t}] + {b_i})\\ {{{\tilde{C}}}_t} = \tanh ({W_C}[{h_{t - 1}},{x_t}] + {b_C}) \end{array} \right. \end{aligned}$$
(16)

where w, b, \(\sigma\) and \(x_t\) represent the weight matrix, bias parameters, sigmoid activation functions corresponding to each gate, and input of the model at time t, respectively. When the memory unit is updated, the output gate \(o_t\) and the hidden layer \(h_t\) can be represented as:

$$\begin{aligned} \left\{ \begin{array}{l} {o_t} = \sigma ({W_O}[{h_{t - 1}},{x_t}] + {b_O})\\ {h_t} = {o_t} \times \tanh ({C_t}) \end{array} \right. \end{aligned}$$
(17)

Traditional LSTM networks mainly use historical background information. However, lacking future feature information may lead to an incomplete feature matrix. As for BiLSTM, it realizes bi-directional features reading by combining the forward and backward LSTM layers, hence fully taking the context feature information (Zhang et al. 2020). The model structure is shown in Fig. 5.

Fig. 5
figure 5

The architecture of the BiLSTM model

Given that at time t, \(\uparrow {h_t}\) is a hidden state of the forward LSTM output, and \(\downarrow {h_t}\) is a hidden state of the backward LSTM, then \(h_t\) can be calculated as follows:

$$\begin{aligned} {h_t} = \uparrow {h_t} \oplus \downarrow {h_t} \end{aligned}$$
(18)

Studies have proven that the BiLSTM model containing overall information is better than other classification algorithm (Wang et al. 2016).

Cross-Validation

To ensure that all data can be used for classification, we performed 12-fold cross-validation (CV) (Xu and Goodacre 2018). First, we randomly divided all the feature samples into 12 sample sets. Within these 12 sets, 11 sets were used to train model, and can be denoted by \({D_{tra}}\), while the 12th set was selected to test model, and denoted by \({D_{tes}}\). We repeated this process 12 times, with each set being used once for testing. Furthermore, we applied an additional 8-fold cross-validation on each \({D_{tra}}\) set, randomly dividing each set \({D_{tra}}\) into 8 subsets, where 7 sets served as training subsets (\({D_{s\_tra}}\)) and the 8th set served as the validation subset (\({D_{s\_val}}\)). \({D_{s\_tra}}\) and \({D_{s\_val}}\) were then assigned to the classifier for training and validating its performance. This process was repeated 8 times, with each subset being used once for validation, to overcome the overfitting problem. We used the following performance measures for classification:

  1. (a)

    Balanced validation accuracy (BVA): For each Dtra, BVA was obtained by averaging 8 pairs of sensitivities and specificities (8-fold CV). By comparing the classification result of different feature sets, the best BVA can be obtained and the optimal classifier (and its parameters) could be obtained as the one with the highest BVA. It is worth knowing that the best results maybe have some differences for a different set \({D_{tra}}\).

  2. (b)

    Balanced Testing Accuracy (BTA): BTA was obtained by averaging the sensitivity and specificity, and the sensitivity and specificity is the test result of the classifier which have trained completely. Through the 12-fold CV, we could obtain the average of all 12 BTAs as ABTA (average BTA). Then, the final optimal classifier (and its parameters) was decided when the BTA reached the highest among all the 12 BTAs.

The cross-validation illustration is shown in Fig. 6. The feature sample set was formed after the F_score feature selection operation, and after two layers of CV, the best training model and the average test accuracy were obtained.

Fig. 6
figure 6

Flow chart of the classification. The operation results and data are expressed in italics. CV i: The first layer cross-validation, testing the optimal model of the second layer CV training. CV II: The second layer cross-validation, training the best model and verify. The two branches of orange lines represent the input of the training set and testing set in the first layer CV, respectively. The same is true for the meaning of the red line in the second layer CV

Statistical Evaluation of Performance

For classification models, the test accuracy is a very good and intuitive evaluation indicator. However, accurate rates do not always fully represent a model as good or bad. Therefore, the performance of the classifier was also evaluated by the following specificity (Spe) and sensitivity (Sen) metrics (Subasi 2007):

$$\begin{aligned}{} & {} Sensitivity = \frac{{TP}}{{(TP + FN)}} \times 100\mathrm{{\% }} \end{aligned}$$
(19)
$$\begin{aligned}{} & {} Specificity = \frac{{TN}}{{(TN + FP)}} \times 100\mathrm{{\% }} \end{aligned}$$
(20)

where sensitivity is the true positive rate, reflecting the probability of missed diagnosis, and specificity is the true negative rate, reflecting the probability of misdiagnosis. The following statistics are used to calculate specificity and sensitivity:

TP (True Positive): the number of addiction cases identified by the classifier, which are actually addiction cases.

FN (False Negative): the number of healthy cases identified by the classifier, which are actually addiction cases.

TN (True Negative): the number of healthy cases identified by the classifier, which are actually healthy cases.

FP (False Positive): the number of addiction cases identified by the classifier, which are actually healthy cases.

Results

Classifier Performance

In this study, 288 cases were classified, including 144 in the addiction group and 144 in the healthy group.

The time-domain features, frequency-domain features and wavelet coefficients of the delta band were used as classification features. Then, the features of 62 channels were connected in series. To improve the performance of classifier and reduce the computation and experiment time, F_score was used to evaluate the discrimination degree of the feature data. Next, arranging features in descending order, and classification was performed. In this study, the classification accuracy of SVM and BiLSTM were compared. To increase the comparability, all BiLSTM and SVM classifiers used the sigmoid activation function. The test accuracy of 12-fold cross-validation was calculated, and the violin diagram was plotted (see Fig. 7). It can be seen that the classification results obtained by BiLSTM in cross-validation are more stable compared with SVM.

Fig. 7
figure 7

Violin plot of the test accuracy with the average accuracy of different classifiers

Table 2 shows the performance evaluation results of the feature dimension with the highest classification accuracy after classifying the data with different classifiers. It was observed that when BiLSTM was used as the classifier, the classification accuracy was better. The P300 features of EEG signals were used to classify MA abusers and healthy subjects, and we achieved a classification accuracy rate of 83.85%.

Table 2 The evaluation indicator of the best models in different classifiers

Performance of Different Electrodes

In addition, due to the characteristics of BiLSTM, we also compared the use of different electrode signals separately for classification. Among them, 20 electrodes achieved a classification accuracy of 60% (see Table 3). It can be seen from Table 3 that most of the electrodes were located in the frontal?lobe, to which the advanced functions of brain, such as judgment, decision making, thinking and executive control are related (Anne et al. 2012). Therefore, the signal of the frontal?lobe can more clearly reflect the mental activity of the two groups during the task implementation. The C3 and C4 electrodes were substantially mapped in the primary motor cortex region, and the region electrode signal also exhibited strong differences. This may indicate that MA has an impact on the function of exercise execution in MA abusers. Meanwhile, this phenomenon may also be related to the symptoms in patients with amphetamine psychosis, such as irritation, anxiety and psychomotor agitation.

Table 3 The classification results of different electrode signals using BiLSTM

Discussion

Identification of MA Addiction Based on EEG

The previous research indicates that men and women exhibit similar MA-related characteristics and behaviors (Brecht et al. 2004). Compared with men, women\('\)s symptoms were more specific to MA and they responded better to treatment (Dluzen and Liu 2008). Therefore, we collected EEG signals from 18 female perpetrators and 22 healthy females, processed using MATLAB R2014b software and the EEGLAB toolbox. After preprocessing, we obtained 144 and 184 groups of data, respectively. Figure 2 illustrates significant differences between the signals of the two groups. The ERP amplitudes corresponding to MA-related stimuli in MA addicts are higher than those in the control group. Studies have shown that MA addicts exhibit attentional bias towards MA-related cues when exposed to such stimuli (Gege et al. 2023; Di et al. 2018). Therefore, this study can provide a judgement basis for the prevention of MA addiction. In future studies, the other component of EEG signals such as N300, P200 and N200 can be analyzed jointly. Although the difference between the two groups of people regarding those components is less than that of P300, it may contain some critical factors that do not present in P300.