1 Introduction

In recent years, global IoT devices have continued to be deployed on a large scale, with a strong growth in the number of connections, widely used in various fields of production and life. It is expected that by the end of 2023, more than 43 billion devices worldwide will be connected to the IoT. The cognitive allocation of spectrum resources and spectrum prediction in the IoT can ensure efficient communication between different devices and users, minimize interference and conflicts, and spectrum prediction is crucial for supporting the transmission of wireless communication signals [1, 2]. While engaging with a variety of different devices, the wireless channels, due to their open nature, can potentially be exploited by unauthorized devices within signal range to intercept and pilfer signals, thereby compromising the security of the wireless IoT system. The IoT network has become the target of illegal attacks, which poses great risks and challenges to the security of wireless IoT systems. In order to ensure network security, intrusion detection for wireless access is necessary.

Wireless communication networks as a part of the IoT plays an important role in the IoT. However, with the development of wireless communication technology, any two users in a wireless network can freely establish a connection, which causes an increasing scarcity of spectrum resources but also leads to a wide range of information leakage and adversarial attack problems, such as eavesdrop** attacks and spoofing attacks. It reduces network availability and information transmission security[3]. [4] proposed an information security transmission scheme for IoT terminals under the joint optimization of done trajectory and resource allocation in the presence of eavesdroppers. Traditional upper layer encryption technology has many limitations, so PLS has become a potential solution to this problem. PLS plays a critical role in wireless communication and distinguishes itself from cryptographic techniques. It serves as a complementary approach to upper-layer cryptographic methods. The fundamental concept of PLS revolves around harnessing the inherent properties and vulnerabilities of wireless channels to establish secure communication at the physical layer (PHY). Many research solutions have been proposed regarding PLS, [5] proposed cognitive user imitation attacks that occur during spectrum switching in cognitive radio networks and utilize artificial intelligence technology to make effective autonomous decisions. The PHY transmission of cognitive radio networks based on nonorthogonal multiple access faces dual threats of primary user interference and third-party eavesdrop**. A unmanned aerial vehicle assisted covert communication model to address the security threats faced in the process of air to ground communication, and maximized average concealment rate under illegal interception and complete secure transmission was proposed in [6]. A new method in the study of physical layer security was discussed, which proposed probabilistic security features and the use of wireless maps to capture uncertainty in wireless environments, especially in eavesdropper channels [7]. In the presence of multiple active eavesdroppers, a closed form expression for the optimal power allocation strategy for transmitting signals and artificial noise (AN) was obtained and the minimum transmission power required to ensure reliable and secure communication was found in [15], where the sending node attempts to send confidential information and transmits it reliably to a legitimate receiving node while avoiding illegal attacks from eavesdrop** nodes during transmission. In PLS, eavesdroppers usually are divided into two types of eavesdrop**, namely passive eavesdrop** that only listens without attacking and active eavesdrop** that takes active attack who impersonates as legitimate users. Usually active eavesdroppers are more damaging because active eavesdrop** initiates attacks that result in greater information leakage. The active eavesdrop** attacks in conventional communication systems include frequency-conducting spoofing attacks and active eavesdrop** attacks during data transmission. The former is different from the latter in that the eavesdropper (E) sends the same (frequency-conducting) signal that is synchronized with the legitimate user [16]. Active eavesdroppers during data transmission can enhance information eavesdrop** by broadcasting their own continuous wave signals [17], and a scheme based on active eavesdrop** with rotated jamming to achieve wireless surveillance is considered in [18], where a legitimate eavesdropper (The reason why legitimate eavesdropper is mentioned here is because in this literature, eavesdrop** link is used as legitimate surveillance link) performs information interception while the auxiliary jamming node interferes with the suspect link to successfully achieve legitimate eavesdrop**. In recent years, due to energy constraints, for example in UAV communications, active eavesdrop** techniques have received a lot of attention. It assumes that the suspicious link can detect wireless eavesdrop**, covert surveillance is achieved by active eavesdrop** [19]. It follows that there is an urgent need to implement effective anomaly detection with the aim of enhancing the reliability and availability of communication systems and to minimize the probability of interception.

This paper presents a novel approach where we put forward to build a neural network model to learn and classify datasets under the framework of deep learning, and introduce the discriminative loss functions into the training of the model. Compared with using the ML algorithm, the suggested approach demonstrates higher accuracy levels in learning data features, and the classification effect is significantly improved. Experiments have shown that reliable eavesdrop** detection capabilities can be achieved for different eavesdrop** attack scenarios by our solution approach.

The subsequent sections of this paper are arranged as follows: Section 2 is a summary of related works. Section 3 provides the system model. We introduce wireless communication systems with eavesdroppers and create a framework for wireless signal datasets. Section 4 describes the detection method based on machine learning. Section 5 presents a deep learning-based BP neural network eavesdrop** detection scheme. Section 6 mainly analyzes the detection performance evaluation of different algorithms.

2 Related works

Intrusion detection is not only the main security problem addressed by the network layer, but also a problem that must be solved by the physical layer. Common intrusion detection algorithms include ML such as Bayesian networks, clustering analysis, support vector machine (SVM) and DL for example recurrent neural networks (RNN) and CNN for classification and prediction. This section primarily discusses detection approaches that utilize ML and DL techniques.

Machine learning techniques have been widely applied in anomaly detection, recognition, and text classification, and have achieved impressive results. Reference [20] discussed improving intrusion detection performance based on ML classifiers through feature selection in cyber-physical systems. Reference [21] proposed a new dual ended machine learning model to improve the prediction accuracy and real-time performance of heterogeneous spectral states. In a cognitive eavesdrop** environment, Reference [22] adopted distributed machine learning algorithms are used to optimize the allocation ratio of secondary device resources to ensure the quality of service for users with higher task priorities. The author of [23] using the relationship between transmitted and received signals considering the transmission process to build a dataset and using SVM algorithm to classify eavesdrop** and legitimate signals, but the detection accuracy of binary classification is not very high. Reference [24] utilized a lightweight network composed of BP neural network, auto regressive integral moving average model, and SVM to achieve intrusion detection and recognition. Reference [25] studied UAV wireless relay systems in the presence of active eavesdroppers and used single-class SVM and K-means clustering analysis to build predictive models to detect eavesdrop** attacks, the study no longer considered general wireless systems but focused on UAV-assisted wireless systems, however, UAVs have limited energy and cannot directly detect eavesdrop** attacks. Reference [26] used machine learning to process the actual propagation process of wireless signals and used Gaussian mixture model for classification. Reference [27] relied on Gaussian mixture model to identify spoofing attacks. Reference [31] proposed a dual denoising autoencoder approach to enhance the security of cyber physical systems by preventing eavesdropped. Reference [33]. A complex CNN for identifying signal spectrum information for multi-signal frequency domain detection and recognition is constructed in [34]. Using deep learning and few sample learning methods to identify different emitters based on RF fingerprint features in [35]. Classification tests are performed on public UCI datasets by deep convolutional neural networks and good performance is achieved in [36]. Implementation of wireless network security through self supervised learning and adversarial enhancement based few shot SEI method for transmitter authentication in [37]. Classification algorithms based on DL have progressively gained widespread acceptance as mainstream methodologies. However, deep learning is widely used in images, voice, video, text and other data classification problems such as radio signal recognition [38], where the data is public and more complex, and there is little research on classification of structured feature vector samples on eavesdrop** attacks directly.

In this paper, we reframe the detection problem as a classification task to optimize the solution, which is traditionally solved based on ML algorithms. Therefore, this article mainly addresses the issue of active eavesdrop** detection in the wireless access process of general wireless systems. In order to enhance the performance of eavesdrop** detection as well as the accuracy of signal classification, we build on the idea of [23, 25] to generate test data from wireless signals, by using statistical knowledge of channel state information (CSI) to create a wireless signal dataset framework, and then artificial training data is created to input the data into ML and BP models. According to the characteristics of the dataset, a BP neural network model based on deep learning architecture has been proposed.

3 System model and creating dataset

3.1 System model

In this paper, the problem of active eavesdrop** detection is studied. The classic eavesdrop** model is shown in Fig. 1a. While the source node communicates with the destination node, there is a possibility that unauthorized eavesdroppers (referred to as E) may intercept the communication and employ deceptive techniques to mislead the destination node, thereby achieving their eavesdrop** objectives. Our system model, as shown in Fig. 1b, a general wireless system that mainly solves the problem of active eavesdrop** detection. The paper considers the system consisting of a single access point (AP), K authorized users, and active eavesdroppers (E) as seen in Fig. 1b. Each individual node is outfitted with a sole antenna, and the placement of them are randomized. The wireless channel connecting the AP and the \(k\)-th user is expressed as \(g_{k}\). Likewise, the communication channel connecting the AP and E is expressed as \(g_{E}\). Usually the wireless communication is divided into the uplink and downlink two phases.

Fig. 1
figure 1

System model diagram

In the uplink, the user sends the pilot sequence to the AP to request communication, and the AP performs channel estimation and identity authentication based on the pilot sequence. Assume that the pilot signal transmitted by the user \(k\) to the AP as \(\textbf{p}_{k}\), \(\textbf{p}_{k} \in \mathbb {C}^{\mathcal {L}\times 1}\) is referred to as a vector arranged in a column consisting of \(\mathcal {L}\) entries, and \(\Vert \textbf{p}_{k} \Vert ^{2}=1\). Among any two different users, that is when \(k \ne {k}^{\prime }\), \(\textbf{p}^{\dagger }_{k} \textbf{p}_{k^{'}}=0\). If a malicious node E launches an attack to steal message \(s_{k}\) between user \(k\) and the AP, it will design the same pilot sequence \(\textbf{p}_{E}\) as the \(\textbf{p}_{k}\), and send it to the AP. At this time, the AP mistook it as the message request of two legitimate users. When the message is returned to the user, it will also be returned to E. Hence, the confidential information is inevitably leaked to E. That is, the SNR of the \(k\)-th user decreases as the power of E increases.

When E appears in the uplink and be proactive, it will result in a lower data rate for the user. Therefore, the disparity in data rates between user \(k\) and E, i.e., the channel capacity, becomes lower.

In the downlink transmission, the AP disseminates signals to the legitimates recipients. Of course, E will also receive these signals. It can be seen that the research on the detection problem of eavesdrop** is very meaningful. If the existence of E is detected, we can stop the communication at any time or take confidential measures such as the convert transmission to reduce the risk of information leakage.

The focus of this study is the detectability of eavesdrop** during the uplink communication, because accurate detection of eavesdrop** can better realize attack identification and further complete identity authentication.

According to the spirit of the dataset framework cited in [23], the idea of using the correlation between the signal transmitted and the signal received to consider the transmission process is introduced into the representation learning of wireless signal features. When user k sends a message requesting communication to the AP, E will steal its message and imitate k while transmitting it to the AP. The only message available to the AP are received signals. At the t-th time slot, the signal received by the AP can be given by

$$\begin{aligned} y_{AP}[t]={\left\{ \begin{array}{ll} \sqrt{\mathcal {L}{p_{u}}}\sum _{k=1}^{K}\textbf{p}_{k}g_{k}[t]+\textbf{n}[t], \quad non-eavesdrop** \\ {\sqrt{\mathcal {L}{p_{u}}}}\sum _{k=1}^{K} \textbf{p}_{k}g_{k}[t]+\sqrt{\mathcal {L}{p_{E}}}\textbf{p}_{E}g_{E}[t]+\textbf{n}[t], \quad eavesdrop**,\end{array}\right. } \end{aligned}$$
(1)

where \(p_{u}\triangleq P_{u}/N_{0},p_{E}\triangleq P_{E}/N_{0}\). In this equation, \(P_{u}\) and \(P_{E}\) refer to the mean transmitting power per user and E; \(N_{0}\) represents the average noise power per receiving antenna; \(\textbf{n}\) is the additive white Gaussian noise (AWGN) vector with \(\textbf{n}\sim \mathcal {N}(\textbf{0},\mathbf {I_{\mathcal {L}}})\). \(y_{AP}[t]\), \(g_{k}[t]\), \(g_{E}[t]\) and \(\textbf{n}[t]\) are the representations of \(y_{AP}\), \(g_{k}\), \(g_{E}\) and \(\textbf{n}\) at time t.

3.2 Creating feature dataset

This section mainly creates a feature dataset framework. We use (1) to obtain the signal \(y_{AP}[t]\) received at AP. Assuming that the pilot vector \(\textbf{p}^{\dagger }_{k}\) transformation processing with \(y_{k}[t]\)= \(\textbf{p}^{\dagger }_{k} y_{AP}[t]\), we can obtain \(y_{k}[t]\),

$$\begin{aligned} y_{k}[t]={\left\{ \begin{array}{ll} \sqrt{\mathcal {L}{p_{u}}}g_{k}[t]+\textbf{p}^{\dagger }_{k}\textbf{n}[t], \quad non-eavesdrop** \\ {\sqrt{\mathcal {L}{p_{u}}}} g_{k}[t]+\sqrt{\mathcal {L}{p_{E}}}g_{E}[t]+\textbf{p}^{\dagger }_{k}\textbf{n}[t], \quad eavesdrop** .\end{array}\right. } \end{aligned}$$
(2)

Let \(a_{k}[t]\triangleq |y_{k}[t]|^2\), then two values can be calculated at AP:

$$\begin{aligned}{} & {} M_{k}^{\left( 1\right) }\triangleq E_{t}a_{k}[t], \end{aligned}$$
(3)
$$\begin{aligned}{} & {} M_{k}^{\left( 2 \right) }\triangleq \tfrac{E_{t}{a_{k}[t]}-E_{k}|\textbf{p}^{\dagger }_{k}\textbf{n}[t]|^2}{E_{k}|\textbf{p}^{\dagger }_{k}\textbf{n}[t]|^2}, \end{aligned}$$
(4)

according to sufficient statistical knowledge, it should be noted that \(E_{t}\{\bullet \}\equiv E_{\{g_{k}\}_{k=1}^{K}}\), \(g_{E}\), \(\textbf{n}^{\{\cdot \}}\) are dependent on \(\{g_{k}\}_{k=1}^K\), \(g_{E}\), \(\textbf{n}\) at t.

In the actual communication process, a user will send the pilot to the AP more than once when accessing the wireless network. Suppose that user sends the pilot vector T times, then there will be different values at AP, \(a_{k}[1]\)...... \(a_{k}[T]\). According to (3) and (4), a dataset consisting of the following two eigenvalues can be created at AP:

Attribute 1 (Mean):

$$\begin{aligned} A_{k}^{\left( 1 \right) }[T]\triangleq \frac{1}{T}\sum _{t=1}^{T}a_{k}[t]={\left\{ \begin{array}{ll} A_{k|H_{0}}^{\left( 1 \right) }[T],\quad non-eavesdrop**\\ A_{k|H_{1}}^{\left( 1 \right) }[T],\quad eavesdrop**,\end{array}\right. } \end{aligned}$$
(5)

where \(H_{0}\) indicates the state of absence of an eavesdropper, and \(H_{1}\) indicates the state that there is an eavesdropper.

Attribute 2 (Ratio):

$$\begin{aligned} A_{k}^{\left( 2 \right) }[T]\triangleq \tfrac{\sum _{t=1}^{T}a_{k}[t]-\sum _{t=1}^{T}|\textbf{p}^{\dagger }_{k}\textbf{n}[t]|^2}{\sum _{t=1}^{T}|\textbf{p}^{\dagger }_{k}\textbf{n}[t]|^2} ={\left\{ \begin{array}{ll} A_{k|H_{0}}^{\left( 2 \right) }[T],\quad non-eavesdrop**\\ A_{k|H_{1}}^{\left( 2 \right) }[T],\quad eavesdrop**.\end{array}\right. } \end{aligned}$$
(6)

When the AP acquires a substantial size of samples (i.e., when T reaches a significant value), then

$$\begin{aligned}{} & {} A_{k}^{\left( 1 \right) }[T]\approx M_{k}^{\left( 1\right) }, \end{aligned}$$
(7)
$$\begin{aligned}{} & {} A_{k}^{\left( 2 \right) }[T]\approx M_{k}^{\left( 2\right) }. \end{aligned}$$
(8)

According to (2), (3) and (4), we convert the received signal to get the features data as the following tabular fashion of Table 1:

Table 1 Features data: T points are related to eavesdrop**, T points are not related to eavesdrop**

The training dataset starts from the \(T_{1}\)-th time slot data points, and when \(T_{1}=1\), explain that all gathered data points during the uplink slots been used. Otherwise, \(T_{1}>1\), only \(T-T_{1}\) data points during the \(T_{1}\)-th to the \(T\)-th time period be used. The location of the \(t\)-th data point in the two-dimensional space can be described as \(\left( A_{k}^{\left( 1 \right) }[t],A_{k}^{\left( 2 \right) }[t] \right)\). k indicates that we are detecting whether user k is under an eavesdrop** attack.

According to (5) and (6), we can get labeled artificial training data sets for SVM algorithm and BP neural network model, as the following form of Table 2:

Table 2 ATD: Labeled T points are related to eavesdrop**, T points are not related to eavesdrop**

4 Eavesdrop** detection with machine learning

ML and DL are common outlier detection algorithms, so before introducing our scheme, we first introduce the classic k-means++ and SVM methods that currently used, the ATD is inputted in them and our proposed method, then compare and analyze the detection performance of these three methods in experiments.

4.1 K-means++ clustering

Clustering algorithm is an unsupervised machine learning algorithm. K-means++ is an enhanced variant of the k-means clustering algorithm that is specifically devised to initialize cluster centers more efficiently, thereby improving clustering quality and algorithm performance. The k-means++ algorithm uses a smarter initialization method to select initial cluster centers to better represent the entire dataset, thereby reducing the risk of the algorithm falling into a local optimal solution. Because it’s a binary classification problem (eavesdrop** and non-eavesdrop**), we set k=2. In this paper, ATD will be used for clustering model.

Specifically, the initialization procedure of the k-means++ algorithm proceeds obey the following manner:

  1. 1.

    Choose a sample from the dataset at random to serve as the initial cluster center.

  2. 2.

    Compute the minimum distance among every data point and the current cluster center (i.e., the proximity to the closest cluster center), and choose the data point with the maximum distance conducting the next cluster center.

  3. 3.

    Iterate step 2 until k cluster centers are selected. It should be noted that the k-means++ algorithm requires significant computational resources due to its high computational complexity, especially when dealing with large-scale data sets.

4.2 SVM classifier

SVM is among the frequently utilized methods in ML. In the field of ML, it is a binary classifier with strong learning and generalization capabilities [39]. The fundamental model aims to establish a linear classifier that maximizes the margin within the feature space. In actual samples, many labeled samples are linearly indivisible, then the kernel trick can better handle this issue. In this paper, mainly conducts experiments with RBF kernel. The SVM takes the optimal separation hyperplane as the decision plane as Fig. 2 shown. Therefore, finding the maximum margin is the main optimization problem.

The optimal hyperplane optimization problem can be formulated by

$$\begin{aligned}&\min \limits _{\textbf{w},\,b}\frac{1}{2}\Vert \textbf{w} \Vert ^{2}=1 \\ s.t\quad y_{i}&\left( \left\langle \textbf{w}^{T}\cdot \textbf{x}_{i}\right\rangle +b\right) \ge 1,\,i=1,...,N, \end{aligned}$$
(9)

where N represents the total count of trained samples, \(\textbf{x}_{i}\) signifies the \(i\)-th sample data vector of the input. \(y_{i}\) denotes the \(i\)-th sample label. \(y_{i}=+1\), if \(\textbf{x}_{i}\) is labelled as 0; otherwise, \(y_{i}=-1\) if \(\textbf{x}_{i}\) as -1. The hyperplane shown in Fig. 2 is located between the two dividing lines, satisfying the \(\left\langle \textbf{w}\cdot \textbf{x}\right\rangle +b=0\). \({1}/{\Vert \textbf{w} \Vert ^2}\) representing the Euclidean distance from one edge to the hyperplane, the goal of (9) is to maximize the width of the two edges \({2}/{\Vert \textbf{w} \Vert ^2}\) to correctly separate samples.

Fig. 2
figure 2

Optimal hyperplane of SVM

For data that cannot be linearly separated, we introduce relaxation variable \(\xi _{i}\), then we control the size of relaxation variables to achieve the optimal classification of the dataset. The optimal hyperplane can be obtained through the optimization problem described as follows:

$$\begin{aligned}&\min \limits _{\textbf{w},\,b,\,\xi _{n}}\frac{1}{2}\Vert \textbf{w} \Vert ^{2}=1+C\sum _{i=1}^{N}\xi _{i} \\ s.t.\quad y_{i}&\left( \textbf{w}^{T}\cdot \textbf{x}_{i}+b\right) \ge 1-\xi _{i},\,\xi _{i}\ge 0. \end{aligned}$$
(10)

C is the regularization coefficient. When \(\xi _{i}=0\), the point falls on the edge of the maximum interval, indicating that the data is correctly classified. If \(0<\xi _{i}<1\), the points fall within the maximum interval, indicating that the data are correctly classified; as if \(\xi _{i}=1\), the point falls on the hyperplane and the data is correctly classified; otherwise data errors classification. According to the Lagrangian function and the KKT condition [23], the optimization problem can be described by (11)

$$\begin{aligned} \max _{\lambda _{1}...\lambda _{T}}&\sum _{i=1}^{N}\lambda _{i}-\frac{1}{2}\sum _{i=1,\,j=1}^{N}\lambda _{i}\lambda _{j}y_{i}y_{j}K(\textbf{x}_{i},\textbf{x}_{j})\\ s.t.\quad&\sum _{i=1}^{T}x_{i}y_{i}=0 \quad 0\leqslant \lambda _{i}\leqslant C,\,i\in \Omega \,. \end{aligned}$$
(11)

\(K(\textbf{x}_{i},\textbf{x}_{j})\) denotes the vector inner product of the defined kernel function, map** indistinguishable nonlinear data to a high-dimensional space for classification. \(\lambda _{i}\) is a Lagrange multiplier.

5 Eavesdrop** detection with deep learning

In this section, we present a DL-based detection scheme using a BP neural network within the framework of deep learning. The paper first introduces deep learning into the wireless communication wiretap** detection.

Artificial neural network (ANN) can be described as a complex and interconnected system of adaptive neurons, the structure of ANN simulates mutually engage responses of the biological nervous system to visible stimuli. ANN possess the capacity for self-learning, self-organization, good error tolerant and outstanding nonlinear approximation capabilities. BP neural network is a type of multilayer feedforward neural network that training using the deviation backpropagation algorithm, and it is recognized as the most extensively employed neural network architectures.

As Fig. 3 shown, it is a typical 3-layer BP neural network diagram. No information interacted between neurons in the identical layer, and information is transferred between different layers according to the connection weight [40]. Its basic principle is to adjust the network weights and thresholds through the gradient descent algorithm, so that the discrepancy between the obtained output and the desired output is minimal or zero [41]. We designate the hidden layer to consist of 10 neurons, and use the output of this layer as the input of the next layer to attain the prediction outcome of the final classification, thus completing the construction of a typical BP neural network model. The BP neural network exhibits robust capabilities in nonlinear map** and offers a versatile network structure. Using the BP model built above to train and classify the dataset, and adjust the constants associated with the feature values to achieve the best classification performance.

Fig. 3
figure 3

BP neural network diagram

To enhance the accuracy of assessment while mitigating the risk of eavesdrop** attacks, we conduct active wiretap** detection based on the model proposed above. As shown in Fig. 4 is the detection and training process based on wireless signal features, including the data processing phase and the model training and debugging phase. First, the wireless signal received during the wireless communication is processed to obtain a dataset of characteristic attributes and perform minimum-maximum normalization on it. Tensioning the dataset and input it into the BP model for training and testing. Then adjust the parameters of classifier selection and loss function and optimization function to fine-tune the model. Finally, the detection task of eavesdrop** signal is completed.

Fig. 4
figure 4

BP neural network algorithm overall flow chart

The experiment is conducted using the PyTorch DL Framework as the implementation platform. 70% of the constructed dataset as the training examples for model training, and the leftover 30% as the testing data to test the performance of the model. During each model training process, different percentage divisions between the training and test sets can be set to enhance the authenticity and credibility of the results.

We select the sigmoid function as the Activation function. The Adam optimizer is employed for modifying the learning rate,and Softmax + Cross-entropy loss is used as the classification loss function to expedite data calculation and make the numerical stability better. The softmax function is a discriminative function, and the output is a probability distribution. The difference in the probability of classification categories is more significant, and the form of the output distribution is closer to the real distribution. Here, the function expression is given as

$$\begin{aligned} Softmax\left( \textbf{x}\right) =\tfrac{e^{x_{i}}}{\Sigma _{j}e^{x_{j}}}\,. \end{aligned}$$
(12)

Cross-entropy loss is calculated as:

$$\begin{aligned} Loss=-\sum _{i=1}^{2}y_{i}\log \hat{y}_{i}\,, \end{aligned}$$
(13)

where y is the true distribution, \(\hat{y}_{i}\) is the network output distribution, j is the number of categories. Cross-entropy loss in the PyTorch deep learning framework combines softmax and cross-entropy to compute. The final calculation formula is as follows:

$$\begin{aligned} Loss\left( \textbf{x},class\right)&=-\log \left( \ \tfrac{exp\left( x[class]\right) }{\Sigma _{j}exp\left( x[j]\right) }\right) \\&=-x[class]+\log \left( \Sigma _{j}exp\left( x[j]\right) \right) . \end{aligned}$$
(14)

So the last layer of model prediction, loss and learning processes can be expressed as Fig. 5:

Fig. 5
figure 5

BP neural network prediction, loss acquisition and learning process

6 Performance evaluations

In our experiment, we will give several numerical examples as specific condition settings. Considering that there may be some changes in the state of radio signals in the actual transmission environment [

$$\begin{aligned} ACC=\tfrac{TP+TN}{TP+TN+FP+FN}\,, \end{aligned}$$
(15)

where TP refers to instances that non-tap** samples are corrected classified, TN indicates that the eavesdrop** samples are correctly classified, FP refers classifying non-tap** samples into eavesdrop** samples and FN means classifying eavesdrop** samples into non-tap** samples.

To bear out the effectiveness and accuracy of the BP model in wiretap** detection, this paper conducts comparative experiments on datasets in different scenarios on various models. The results of the experiments are presented in Table 3.

Table 3 Comparison of the accuracy of different models

T represents the number of pilots sent by a single user, the size of samples (n) indicates the quantity of samples with a label of 0 (or label of 1). The feature composition of the dataset is composed of Mean and Ratio as shown in Table 2, and the experiment was conducted at SNR=10 dB. Comparing the results in Table 3, it becomes obvious that using BP neural network as a classifier is able to obtain higher detection accuracy. We found that the k-means++ clustering effect is not ideal. So we use the common evaluation indexes of clustering algorithm SSE, SC and CK Index to measure the clustering effect of k-means++ algorithm. The results of the above nine data sets are shown in Fig. 6 (The x-axis \(n-T\) refers to (200-10, 200-20, 200-50, 2000-10,... 4000-50)).

Fig. 6
figure 6

SSE, SC and CH index in different cases

SSE is one of the most common metrics used to evaluate the effect of clustering, and its smaller value indicates better clustering effect. The SC is used to evaluate the tightness and separation of the clusters, and a larger value indicates that the cluster in which the data point is located is more reasonable. CH Index is a kind of index to evaluate the effect of clustering, and the higher value indicates the better clustering effect. According to Fig. 6, we can see that under different T conditions when n is the same, the SSE curve shows an overall upward trend, the SC curve shows an overall downward trend, and the CH curve shows an overall downward trend; When T is the same and under different n conditions, as T increases, the SSE value increases, the SC value decreases, and the CH value increases. Therefore, based on comprehensive analysis, the k-means++algorithm is not suitable.

Comparing SVM algorithm and BP neural network, we found that under the same sample size, the detection accuracy of both classifiers escalates with the raises of T, and the BP model exhibits superior detection performance compared to the SVM method. As the sample size increases, there is an overall inclination of the SVM detection accuracy is increasing, which has little effect on the detection results of BP model. We visualize the classification effect as shown in Fig. 7 where in the case of \(n=200\), \(T=10\). Fig (a) shows the classification results based on SVM algorithm for the reference we compared, while Fig (b) shows the classification performance of the proposed BP neural network. Based on the comparison, it is evident that the classification performance of the BP neural network classifier proposed in this article is better.

Fig. 7
figure 7

SVM and BP classification comparison chart

6.2 Comparison of precision and F1 scores between SVM and BP neural networks under different signal-to-noise ratios

Different from [22], in addition to comparing accuracy, we have also added new comparative indicators under different SNRs. When E launches an eavesdrop** attack on the uplink, the SNR of the \(k\)-th user will decrease as the power of E increases. SNR can also be used to judge whether the signal is interfered. If the SNR is low, it means that the signal quality is poor and there is more noise in the signal. Therefore, SNR is a very important parameter that can serves as metrics for assessing algorithm performance. Hence, we analyze the behavior of the SVM algorithm and proposed in this paper by comparing the precision and F1 score under different SNR when n=2000, T=10. We summarize the experimental data in Table 4.

Table 4 Comparison of experimental results with different indicators under different SNRs

Precision measures the accuracy of the model in predicting non-tap** samples, the F1 score represents the balanced measure of precision and recall, calculated as their harmonic mean, taking into account the accuracy and completeness of the model. According to the results in Table 4, we can find that our proposed BP neural network is almost always higher than SVM in precision and F1 score. To further enhance the validity of the model in this paper, we give the ROC curve plot for one of the cases as shown in Fig. 8.

Fig. 8
figure 8

SVM and BP classification comparison chart

On the ROC curve, the Area Under the Curve (AUC) indicates the performance of the algorithm, and the effectiveness is indicated by a higher AUC value, with a value closer to 1 indicating superior performance. ROC curve can help us make more objective decisions in model evaluation and selection.

It can be seen that the classification effect of the BP neural network classifier proposed in this paper is better. Comprehensive analysis shows that the deep learning-based BP neural network model performs satisfactorily, and it performs better in wireless communication eavesdrop** detection scenarios.