1 Introduction

Nowadays, the protection of critical infrastructure has become a pressing concern because of the increasing number of IoT devices and their proliferation by several industries [1]. Malware, taking advantage of zero-day vulnerabilities, is a major problem in IIoT settings. The attackers, across the globe, firstly infect the critical devices and then use techniques, such as APT, DoS, and Distributed DoS to exert control over and alter their operations [2, 3]. For the Iranian nuclear program, in 2013, hackers from Iran broke into the control system. Malware caused power outages for at least 80,000 customers in Ukraine, and most recently, SFG malware attacked European energy companies [4, 5]. These attacks have shown that the typical cyber-security methods, including security rules, authentication, firewalls, and IDSs, are still unable to ensure the safety of critical infrastructure [6]. Typically, deployment of the IDS is considered an auxiliary defense after the installation of a firewall, anti-virus software, and access control systems for the purpose of detecting assaults on IIoT devices [7, 8]. Here, the term “intrusion detection system (IDS)” refers to a software and/or hardware technique used to monitor and identify suspicious activity across networked systems [9,10,11].

The purpose of the paper work [10] is to offer a feature selection approach for intrusion detection systems (IDS) in the Internet of Things (IoT) that makes use of Information Gain (IG) and Gain Ratio (GR) using the top fifty percent rated features. The research study aims to provide people interested in learning more about networking data collection and analysis mining for social networks a quick and basic review of the fundamentals of the field [11]. While the former can identify known assaults but at the expense of a large number of mistakes, the latter can detect both known and novel attacks [12]. If the approach of an anomaly-based IDS can effectively identify both known and undiscovered threats that seek to infiltrate IIoTs, it might be a strong tool [13, 14].

Classical data mining methods, rules-based models, artificial intelligence approaches, and statistical models have already been reported in research for the construction of IDSs. However, owing to the overlap between normal and abnormal data, these techniques often provide high false positive rates (FPRs) [15]. Nearly all current IDS rely on outdated machine learning methods to create detection models [16]. So, there are still several obstacles that traditional machine-learning-based solutions must overcome [17]. Recently, the use of deep learning (DL) methods in IDSs has been strongly recommended in research to improve the performance of IDSs. In foreseeing the future of the IoT, DL speeds up the analysis between real-time data streams and their faster virtual counterparts [18]. Due to the superior accuracy and ease of information extraction in DL, it has supplanted more conventional forms of education [19]. This has led to a number of studies focusing on employing deep learning methods to provide new solutions to irregularities and malware detection; nevertheless, the findings are still unconvincing [20].

The majority of IDS resolutions are also extensions of technologies that are already present in other kinds of networks, including ad hoc networks for computers and mobile devices. However, the IDS, developed for such systems, is not applicable to IoT applications because of the exceptional features of IoT-based systems, such as access to worldwide internet resources [21]. The purpose of this study is to provide a new DL-based technique for IIoT attack prediction. Using one of the most recent datasets, the suggested method identifies IIoT threats with high accuracy and reduced prediction time. Multiple performance metrics with variable bounds are defined and used to assess the proposed algorithm’s efficacy. We also compare the simulation results with those of other state-of-the-art machine learning classifiers to demonstrate the effectiveness of the proposed method.

2 State of art

The DL based model, using sparse evolutionary training (SET), has been developed in the paper [22] for the analysis and detection of the most common types of threats. The SET-based prediction model has a mean accuracy of 99% with an average testing duration of 2.29 ms. The anticipated model has been determined to enhance accuracy by an average of 6.25 percent. Although this study's proposed approach demonstrates strong overall detection performance and efficiency, it's important to acknowledge certain limitations. Initially, it is important to note that the novel sparse evolutionary training (SET) based prediction model may not completely restore the optimal detection state of the original model during training. Additionally, it leads to an increase in the number of intrusion detection steps.

Guezzaz et al. [23] present a hybrid Intrusion Detection System (IDS) for Industrial Internet of Things (IIoT) security that is built on edge computing and utilizes machine learning (ML) approaches. This study utilizes a combination of K-Nearest Neighbor (K-NN) and Principal Component Analysis (PCA) to identify outliers and wrongdoers. Here, the K-NN classifier has been incorporated to enhance the detection accuracy and make relevant judgements, and the principal component analysis is used to improve feature engineering and training. The data obtained shows that, the edge based IIoT security hybrid IDS, outperforms the other recent models. This method obtains an accuracy of 99.10%, a detection rate of 98.4%, and a false alarm rate (FAR) of 2.7% on the NSL-KDD dataset, and an accuracy of 98.2%, a detection rate of 97.6%, and a FAR of 2.9% on the Bot-IoT dataset. But the present work fails to enhance the PK-IDS framework by incorporating advanced artificial intelligence techniques, while also neglecting to consider the unique characteristics of edge-based IIoT. In comparison, this paper exclusively utilizes the K-NN classifier method.

Furthermore, a DL procedure, namely a Classifier-Convolution Neural Network Memory [24], and a rule-based feature-selection algorithm [25] have also been used to provide a unique model for recognizing different kinds of attacks. Applying deep neural networks (DNN) with bidirectional LSTM as a hybrid approach in real-time poses challenges in terms of accuracy [24]. This is due to the lack of validation on dynamic datasets and the requirement for more reliable results when using various kinds or mixes of deep learning techniques on diverse datasets. In [25], two datasets, namely NSL-KDD and UNSW-NB15, have been considered to check the effectiveness of the DL based model. It is shown that the DL based approach has a 99.0% accuracy rate, a 99.0% detection rate, and a 1.0% false positive rate (FPR) for the NSL-KDD dataset, and a 98.9% accuracy rate, a 99.9% detection rate, and a 1.1% FPR for the UNSW-NB15 dataset. But, by incorporating mixed rule-based characteristic collection, the suggested method achieves enhanced consistency by selectively utilizing relevant features for class classification in the data sets. This model may not be suitable for all features.

Moreover, hybrid neural networks (HDRaNN) have been addressed for the detection of attacks in the IIoT network [26]. DS2OS and UNSW-NB15, two datasets relevant to IIOT security, are used to assess the suggested approach. However, there are a lot of security and privacy issues with the integration of physical and cyber systems. The suggested model's consistency is enhanced by the hybrid rule-based feature collection approach, which uses a restricted number of suitable features for class categorization inside the data sets. This model does not support all functionality equally.

As a hybrid model, the HDRaNN combines the strengths of a deep random neural network with a multilayer perceptron using dropout regularization. This technique uses two IIoT security-related datasets, DS2OS and UNSW-NB15. The performance of the proposed method is measured in a variety of ways. Some of them are accuracy, precision, recall, F1 score, log loss, Area Under the Curve (AUC), and Region of Convergence (ROC). For DS2OS and UNSW-NB15, the HDRaNN obtained an accuracy of 98% and 99%, respectively, while categorizing 16 various kinds of cyberattacks. Comparing the proposed strategy’s performance metrics against those of other cutting-edge attack detection approaches allows for an accurate assessment of its effectiveness. HDRaNN performed better than other DL-based approaches, according to the results obtained. To prevent biased categorization caused by over-fitting and under-fitting, this research employs the synthetic minority over-sampling (SMOTE) method [27]. But not every cyberattack can be addressed by the approach this study proposes [27]. Almaiah et al. [28], presents a DL based system with two-stages of cyber security and privacy. But this leads to the presentation of the two suggested methods in this work whenever a large data flow is required to get the desired results. To attain the necessary degree of anonymity and security, a blockchain system is first built in which all participating units are recorded, validated, and eventually certified utilizing smart (BiLSTM). The publicly available IoT-Botnet and ToN-IoT datasets provide the foundation for the experimental results. Simulation findings compared to those of benchmark models verify the proposed framework's higher performance. Using the benefits of the IoT for industrial process management, the IIoT is a groundbreaking endeavor to establish a smart manufacturing eco-system. The following sectors and services benefit greatly from IIoT’s fast expansion:

  1. 1.

    IoT devices are utilized to monitor, sense, and track equipment, patients, and medications in healthcare systems.

  2. 2.

    Internet of Things (IoT) devices are used for farm monitoring, smart watering of plants, and inventory management in the agricultural business.

  3. 3.

    Supply chain businesses can't function without the transportation and logistics sectors. (IoT) devices are utilized to pinpoint the exact position of a moving vehicle in this context.

It is also used in calculating the product’s expected delivery date. Using the IIoT, the energy industry can keep track of the grid, its invoicing, and its leakage monitoring. IoT devices are utilized to manage warning systems, sense crisis signals, track underground miner activity, and monitor shipments in the mining sector. ICS, which encompasses things like SCADA networks and PLCs, is a term used to describe the robust automation sector (PLC). The majority of cyberattacks, like Stuxnet, the German assault, the Shamoon attack, Mirai, etc., target industrial automated systems.

The study illustrates the learning paradigms that are being applied in industry, as well as the architecture, security, and privacy concerns that are being faced. Furthermore, the report delves into a multitude of research issues that are intended to facilitate future rectifications in the methods that are utilized to address odd complications that arise in the sectors [29]. An anomaly detection method for Internet Intrusion Control Systems (IICSs) is proposed in this research [30]. Despite being designed to guard against risks based on cyberattacks, Network Intrusion Detection Systems (NIDSs) have a challenging problem when it comes to gathering data that will be used to create an intelligent NIDS that can effectively identify both new and ongoing assaults. The study presents a unique method for intrusion detection on the Internet of Things (IoT), which is accomplished via the use of a specialized deep learning algorithm [31]. However, these methods are limited to identifying attempted network intrusions.

The protection of such devices is now a significant concern because of the fast increase in the number of applications and devices that are connected to the Industrial Internet of Things (IIoT). Industrial firms are a common target of cyberattacks. IoT devices provide cybercriminals with numerous entry points into the industrial process. The classic model incorporates robust safeguards into the network infrastructure. Industrial control systems need a reliable intrusion detection technique to stave off threats.

3 IoT concept at industrial platform

Intelligent sensors, actuators, and other devices, such as radio frequency identification tags, are used in the Industrial Internet of Things (IIoT) to improve industrial and manufacturing processes. These gadgets operate together as a network to gather, share, and analyze data. Increased dependability and efficiency are facilitated by the process's insights. IIoT, sometimes referred to as the industrial internet, is used by several sectors, including manufacturing, energy management, utilities, and the oil and gas sector.

IIoT leverages the data generated by dumb machines that have been in industrial settings for years by using real-time analytics and smart machine capabilities. The underlying principle of IIoT is that intelligent robots are superior to people not only in real-time data capture and analysis but also in conveying critical information that may expedite and improve the accuracy of business choices. IIoT especially has the potential to improve supply chain efficiency overall, traceability of the supply chain, sustainable and green practices, and quality control in the industrial sector.

A system of interconnected intelligent devices known as the Internet of Things (IIoT) is used to monitor, gather, share, and analyze data. The following components make up each industrial IoT ecosystem:

  • Connected devices have the ability to perceive, transmit, and retain data about their own state.

  • Infrastructure for the transmission and exchange of data, both in the public and private domains.

  • Data analytics and software that transform raw data into valuable business insights.

  • Storage for the data produced by the Industrial Internet of Things (IIoT) devices.

  • Consumers.

In order to provide consumers with services that are enabled by intelligence, the architecture of the IIoT has been comprised of computational objects that are coupled with the infrastructure of the IoT. As an additional point of interest, the Internet of Things network has been designed as a four-layer architecture, which includes the device layer, the network layer, the infrastructure layer, and the application layer. The Fig. 1 is a graphic representation of the taxonomy basic architecture diagram of the Internet of Things (IoT).

Fig. 1
figure 1

A fundamental overview of the creation of the architecture for the IIoT

3.1 Typical IIoT attack categories

Industrial Internet of Things (IIoT) systems are open to a variety of threats that might jeopardize their security, interfere with their regular operations, or endanger vital infrastructure. Creating successful cybersecurity tactics requires an understanding of these prevalent IIoT threat types. Let's examine a few of the most common attack methods seen in the IIoT environment:

  1. 1.

    Denial-of-Service (DoS) attacks This is a kind of cyberattack where the attacker's goal is to stop a computer or other equipment from working normally so that the intended users cannot use it.

  2. 2.

    Man-in-the-Middle (MitM) attacks One frequent kind of cybersecurity attack that lets attackers listen in on two targets' conversations is the MitM attack. Attackers may use devices or network infrastructure flaws as openings to intercept, alter, or introduce malicious instructions.

  3. 3.

    Device exploitation Hackers seek to acquire unauthorized control or access by exploiting vulnerabilities that are present in equipment connected to the Industrial Internet of Things (IIs). They are able to undermine the operation of the device and possibly seize control over the whole IIoT system by preying on security flaws such as default or weak passwords, old firmware, or software that has not been patched.

  4. 4.

    Physical attacks Physical assaults include the deliberate manipulation of IIoT devices or infrastructure components. Adversaries have the potential to physically breach the devices in order to manipulate sensors, introduce harmful code, or interfere with the functioning of vital equipment. Physical assaults provide a substantial hazard to the reliability and security of industrial operations.

  5. 5.

    Data interception and tampering Devices, networks, and cloud platforms must be able to share data in a smooth manner in order for IIoT systems to function properly. The purpose of this kind of assault is to affect the meaning or accuracy of the data that is being attacked.

  6. 6.

    Supply chain attacks For the purpose of infiltrating a target's system or network, a supply chain assault makes use of tools or services provided by a third party. Master the art of preventing assaults on supply chains. It is possible for them to acquire unauthorized access to or control over the IIoT system by inserting malicious code or interfering with the devices. This poses enormous hazards to the whole infrastructure.

  7. 7.

    Firmware and software vulnerabilities For optimal functionality, IIoT devices often depend on both software and firmware. Those who want to obtain unauthorized access, influence the operation of the device, or introduce malicious malware into the system may take advantage of vulnerabilities that exist within the firmware or software.

Organizations should take preventative actions to strengthen the security of their IIoT systems by learning about these typical IIoT threats. To reduce the risks associated with these attack vectors, it is vital to implement strong security controls, update software and firmware periodically, perform vulnerability assessments, and use safe coding and configuration standards.

4 Proposed system

The proposed model consists of a deep autoencoder (DAE) combined with a deep neural network (DNN). The model is developed using hyperparameter optimization (HPO) procedures. This research provides an alternative solution to deep learning structure models through a HPO process that has the Archimedes optimization algorithm (AOA). Therefore, Fig. 2 presents the exact workflow of the model that is used to detect attacks from the dataset.

Fig. 2
figure 2

Steps used in the proposed model

The steps, used in Fig. 2, is briefly explained in the followed sub-section.

4.1 Description of dataset

The free dataset, namely DS2OS, often used to test the accuracy and effectiveness-based cybersecurity programs, is available in [32]. It discusses several attacks, like smart homes, smart factories, smart buildings, etc. as they pertain to sensors and apps. There are a total of 35,952 samples and 13 characteristics available in this dataset. There are 34,793,575 regular data values and 100,017 outlier data values, organized into 8 categories. Also, 2,500 missing values for the “Value” feature, and 148 missing values for the “Accessed Node Type” feature are available in this dataset. The description of the assault’s values, existing in the dataset, is given in Table 1.

Table 1 Descriptions of the occurrence of the assault’s values

4.2 Dataset pre-processing

The preparation of data, in line with the understanding capability of ML/DL method, is considered as the first phase of accurate data examination. Two feature fields, “Accessed Node Type” and “Value” are lacking data in this collection. There are 148 “NaN” entries in the “Accessed Node Type” data column. There is a high risk of losing useful information if these 148 rows are deleted since this functionality makes use of category data. For this reason, the “NaN” value has been replaced with the malicious “Malicious” value. There are also blanks in the “Value” column. In their place are some more anticipated values with actual significance. Values of 1.0, 0.0, and 20.0 stand “True”, “False”, and “Twenty” correspondingly. Therefore, the data in this collection includes both numerical and categorized information. All numerical information may be broken down further into two more distinct types: continuous and discrete. The values of a categorical variable might be either ordinal or nominal. All columns in the dataset, excluding “Value” and “Timestamp” include category nominal variables. Both columns include numeric constants. Categorical information must then be transformed into feature vectors. Label encoding is used here to transform the category encoding allows for a constant number of features while still being easily integrated into DL methods and requiring less processing time than one-hot encoding.

4.3 Data classification

Specifically, the created deep organized prototypical is a hybrid perfect that employs both a deep auto encoder (DAE) for pre-training and a detection procedure. Firstly, the data must be cleaned and pre-processed. Then, the deep model, using HPO, has been further refined to find one that yields optimal detection results. At this point, sufficient data has been collected to evaluate classifiers and determine which one yields the optimal model. There is a substantial effect of the hyper-parameter settings of the DL algorithm. The tuning hyperparameters were processed by the HPO method. Pre-training, using DAE, has been employed in the proposed IDS network for feature extraction and fine-tuning with DNN construction. In a DAE, encoding and decoding are used in tandem with feature extraction. The feature representation retrieved from the dataset is a representation of the bottleneck layer with a lower feature. The fine-tuning method for a DNN involves transferring the outcome of an encoding structure, together with values.

The DAE architecture uses the pre-processed dataset (X) as its input layer. The results, from the DAEs, are consistent with X data, or X-like data. The decoding layer’s structure and data are not passed on to the DNN model. Data is classified into binary classes and then further categorized into multiclasses using the DNN model’s training. The optimal model is obtained by hyper-parameter adjustment by monitoring the attack classification detection rate.

4.3.1 Deep auto-encoder model

The major focus of this study, in part, is to develop a reliable IDS by improving the representation of low-dimensional characteristics via their extraction from raw data. The purpose of the feature extraction procedure is to improve the DS2OS dataset’s ability to identify and classify binary (normal or anomalous) attacks. Multiple hidden layers characterize a DAE, which is an AE layers of a DAE enable the AE to mathematically understand more complicated data designs. The encoding procedure on a simple AE with one hidden layer involves a map** from input layers to the hidden layers. Decoding more encoder-decoder pairings may be found in a DAE with more hidden layers. In all, there are five discrete levels to the DAE architecture. The first stage of DAE is when encoder E1 encodes input X, encoder E2 encodes encoder E1’s output, and encoder E3 encodes encoder E2’s output. The expression, Z = E3(E2(E1(X))), describes the intermediate layer encoding process. For an AE vector encoder ‘h’ in a layer, we get h = f (W⋅X + b), where W is a weight vector, X & b are vector and bias. In forward propagation, the buried layer l vector encoding purpose develops Eq. (1)

$$h^{{\left( {l + 1} \right)}} = f\left( {W^{\left( l \right)} \cdot h^{\left( l \right)} + b^{\left( l \right)} } \right)$$
(1)

So, each layer can be written as \(E1{ } = { }f\left( {W^{\left( 1 \right)} \cdot X + b^{\left( 1 \right)} } \right)\); \(E2{ } = { }f\left( {W^{\left( 2 \right)} } \right) \cdot E1 + b^{\left( 2 \right)}\); and \(Z{ } = { }E3{ } = { }f\left( {W^{\left( 3 \right)} } \right) \cdot E2 + b^{\left( 3 \right)}\). The order of operations during decoding is reversed from that of encoding, with the first decoder being processed last. In the last step of reconstruction, (X) = D1(D2(D3(E3(E2(E1(X))))). Since DAE uses a decode function for layer X = f(WT⋅h + b’) looks like this (X) = D1. The neurons in a neural network are activated or deactivated using a mathematical operation called the activation function (.) applied to the corresponding output signal. For example, if we want the output value to be somewhere between 0 and 1 or 1 and 1, then we may use an activation function to map that range.

The DAE uses the distance function among the original (X) and the rebuilt (X) as its cost function. Mean squared error loss is used to determine the cost or loss of an activation function is given in (2):

$$J\left( {w,{ }b,x^{i} ,\hat{x}^{i} } \right) = \frac{1}{2}x^{i} - \hat{x}^{i2}$$
(2)

After the input data is normalized to a range from 0 to 1, a nonlinear sigmoid function may be used to rebuild the output layer. For function, an input may be any binary integer or input with a range between 0 and 1. With respect to m-data training as a whole,\(J\left( {w,{ }b} \right) = { }\frac{1}{m}{ }\mathop \sum \nolimits_{i = 1}^{m} { }J\left( {w,{ }b,{ }x^{i} ,\hat{x}^{i} } \right)\) is given as:

$$J\left( {w,{ }b} \right){ } = { }\frac{1}{m}{ }\mathop \sum \limits_{i = 1}^{m} \left[ {x^{i} \log \left( {\hat{x}^{i} } \right) + \left( {1 - x^{i} } \right)\log \left( {1 - \hat{x}^{i} } \right)} \right]{ }$$
(3)

To do this, backpropagation periodically adjusts of every node in every layer. Almost little value is lost at the best possible price. The information on the layer that occurs after the AE training phase is complete. Transfer learning (TL) entails feeding the encoded structure (Z) into a DNN classifier. With the help of TL, we can easily transmit the AE's weight and bias values to our classifier while still preserving the Z-encoding structure of the AE.

In order to train and learn the classifier model, a pre-training phase is performed using the AE feature extraction approach. With the use of transfer learning, HPO accompanied by an AE may alter the classifier model's education. Improving the IDS calls for tweaking the AE model's hyperparameters as well. The activation function, the learning rate, the function loss was all tuned as part of the hyperparameter tuning procedure for the model feature extraction using AE. The AE metrics were determined by tracking the AE model's loss value.

4.3.2 Deep neural network model

As a classifier, a DNN algorithm was implemented into a model for detecting attacks. Automatic leads to the encoding process. DNN architecture is fed input data X in the form of training output Z produced by the AE process. To the DNN's output y layer, another is appended [33]. The AE's weight and bias values are then used as pretraining values in a retraining procedure to acquire knowledge of the output y. y, the result, may be expressed as:

$$\hat{y} = f\left( {W^{\left( l \right)} \cdot h^{\left( l \right)} + b^{\left( l \right)} } \right) = f\left( {z^{{\left( {l + 1} \right)}} } \right)$$
(4)

Here, l + 1 is the depth of the topmost layer and f(.) is the activation function of the AE structural function. The rectified linear unit (RLU) function provides benefits in comparison to other activation functions. For the hidden layer, we tested various kinds of RLU activation functions. We are employed in the output layer. Before the DNN could be trained, an initial parameter was set. The DAE encoding technique yielded the values required to be set to a tiny random number (say, a distribution centered on zero, such as n(0,0.1)) to begin with. Due to rounding, the final result, y, is rather near to the correct value. Difference desired value yi is calculated at each output node. The error value in the hidden unit is intended by averaging the relative importance of the error nodes that take h I l as inputs. Both the binary cross-entropy and the sigmoid functions are used. Figure 3 presents the architecture of the model.

Fig. 3
figure 3

Architecture Diagram of DNN

4.3.3 Archimedes optimization algorithm (AOA)

The AOA is a procedure that uses data from a population [34, 35]. The proposed method uses DAE for HPO, where the population members themselves serve as the objects and parameters submerged in the environment. To the same extent as it is used in other population-based metaheuristic algorithms, acceleration is utilized in AOA. At this point in time, the initial fluid location of each individual item is also first established. After an initial population’s fitness has been assessed, AOA will continue to iterate until a termination condition is met. The density and volume of all objects are revised by AOA at each cycle. Once an item collides with another nearby object, its acceleration is adjusted accordingly. The new location of an item is calculated using its current as follows:

  1. a.

    Algorithmic phases It is possible to see the AOA as a global optimisation approach due to its theoretical incorporation of both exploration and exploitation. The suggested AOA's mathematical steps are as follows.

This subsection presents the AOA algorithm's mathematical preparation. It is possible to consider AOA to be a global optimization approach due to the fact that it potentially combines both exploration and exploitation procedures. The following are the mathematical stages of the proposed AOA.

Step 1: Initialization put everything in its default place using (5) as a starting point:

$$O_{i} = lb{ }_{i} + rand \times \left( {ub_{i} - lb_{i} } \right);i = 1,2, \ldots ,{ }N$$
(5)

For every population of N objects, Oi represents the ith item. The search-space is bounded below by lbi and above by ubi.

Set the volume of the ith object to (6):

$$\begin{aligned} den_{i} & = rand \\ vol_{i} & = rand \\ \end{aligned}$$
(6)

Here rand represents a vector that creates a random value between zero and one.

Finally, set the ith object's acceleration to zero, using (7):

$$acc_{i} = lb_{i} + rand \times \left( {ub_{i} - lb_{i} } \right)$$
(7)

This step involves conducting a fitness analysis on the initial population and picking the fittest item. Assign \(x_{{{\text{best}}}}\), \(den_{{{\text{best}}}}\), \(vol_{{{\text{best}}}}\), and \(acc_{{{\text{best}}}}\).

Step 2: Revision of the thickness and volume. For repetition t+1, object i's density and volume are modified using (8):

$$\begin{aligned} den_{i}^{t + 1} & = den_{i}^{t} + rand \times \left( {den_{best} - den_{i}^{t} } \right) \\ vol_{i}^{t + 1} & = vol_{i}^{t} + rand \times \left( {vol_{best} - vol_{i}^{t} } \right) \\ \end{aligned}$$
(8)

Here, volbest and denbest represent the data volume and rand is a chance integer drawn from an unchanging distribution. There is initial chaos when things collide, followed by an attempt at balance. The AOA does this with the aid of the transfer operator ‘T F’, which changes the focus of the search from exploration to abuse, as indicated by (9).

$$TF = exp\left( {\frac{{t - t_{max} }}{{t_{max} }}} \right)$$
(9)

The time, it takes to transmit a single unit of TF, progressively grows until it achieves the unit value. In (9), t represents total numbers of iterations. The density-reducing factor, ‘d’, helps AOA with its global-to-local search. Using the number (10), it goes down over time:

$$d^{t + 1} = exp\left( {\frac{{t_{max} - t}}{{t_{max} }}} \right) - \left( {\frac{t}{{t_{max} }}} \right)$$
(10)

The value of dt+1 diminishes with time, allowing convergence to a previously defined sweet spot. It is important to keep in mind that this variable must be managed in such a way that exploration and exploitation are kept in a healthy balance in AOA. In the event of a collision (TF of 0.5), a (mr) is chosen, and the acceleration of the object is modified for the next iteration (t + 1) by the formula (11):

$$acc_{i}^{t + 1} = \frac{{den_{mr} + vol_{mr} \times acc_{mr} }}{{den_{i}^{t + 1} \times vol_{i}^{t + 1} }}$$
(11)

Here, deni voli, and acci are the respective values for item ‘i’ whereas accmr, denmr, and volmr represent random substance, respectively. It’s worth noting that a TF of 0.5 guarantees exploration occurs on one-third of iterations. The exploration–exploitation dynamic is modified when a value other than 0.5 is applied. Phase of exploitation, or Step 2, object acceleration is updated each iteration t + 1 using (12) if TF > 0.5, indicating no collision among objects.

$$acc_{i}^{t + 1} = \frac{{den_{best} + vol_{best} \times acc_{best} }}{{den_{i}^{t + 1} \times vol_{i}^{t + 1} }}$$
(12)

where \(acc_{{{\text{best}}}}\) is the hastening of the greatest object.

Step 3: Adjusting acceleration, using the formula (13), to yields the percentage change:

$$acc_{i - norm}^{t + 1} = u \times \frac{{acc_{i}^{t + 1} - {\text{min}}\left( {acc} \right)}}{{\max \left( {acc} \right) - {\text{min}}\left( {acc} \right)}} + l$$
(13)

where u and l are the normalization range and are defined as 0.9 and 0.1. Every agent’s proportion of step change is calculated using the acct+1i-norm. A high acceleration value designates that object ‘i’ is in the exploration phase, whereas a low value designates that object ‘i’ is in the exploitation phase. This exemplifies the progression from the exploratory to the exploitative stages of the search. The acceleration factor often starts at a high number and diminishes with time. As a result, search agents are aided in their pursuit of the optimal global answer while simultaneously being diverted from less optimal, locally relevant options. In any event, it is worth noting that there may be an unimportant sum of search agents that need a longer exploration period than is typical. In this way, AOA strikes a happy medium between discovery and exploitation.

Step 5: Adjustment of the location: When TF is less than 0.5, location for the next iteration, t + 1, is calculated as (14)

$$x_{i}^{t + 1} = x_{i}^{t} + C_{1} \times rand \times acc_{i - norm}^{t + 1} \times d \times \left( {x_{rand} - x_{i}^{t} } \right)$$
(14)

Here \(C_{1}\) is continuous equals to 2. If \(TF{ } > { }0.5\), the objects inform their locations using (15).

$$x_{i}^{t + 1} = x_{best}^{t} + F \times C_{2} \times rand \times acc_{i - norm}^{t + 1} \times d \times \left( {T \times x_{best} - x_{i}^{t} } \right)$$
(15)

C2 = 6, where C is a constant. The time variable T is defined as T = (C3) (TF), where C is the constant and TF is the transfer operator, and both variables grow as time passes. T grows over time in the interval \(\left[ {C_{3} ,{ }0.3,{ }1} \right]{ }\) and initially steals a fixed proportion of the top spot. In the beginning, the percentage is low, such that the gap between the best possible position and the present position is wide, making the random walk's steps huge. This proportion increases steadily as the search moves forward to close the distance between the ideal position and the one that is currently occupied. Finding a sound compromise between discovery and abuse is the end result of this.

Here, F is the signal for reversing the direction of motion in (16):

$$F = \left\{ {\begin{array}{*{20}l} { + 1 \quad if\; P \le 0.5} \\ { - 1 \quad if \;P > 0.5} \\ \end{array} } \right.$$
(16)

where \(P = 2 \times rand - C_{4} .\)

Step 6: Evaluation: Determine the greatest answer so far by evaluating each item using objective function f. Assign \(x_{best} ,{ }den_{best} ,{ }vol_{best}\), and \(acc_{{{\text{best}}}}\).

5 Results and discussion

A Dell G5 gaming PC is used to test the proposed optimized method and compare it to existing deep learning classifiers. A 4.7 GHz Intel Core i7-9700 CPU with turbo technology is installed in the system. The system’s memory was a whop** 16 GB of DDR4 RAM. A GB graphics card was added to ensure machine learning procedures. Using the Python programming language, the approach has been included in the Anaconda Navigator.

5.1 Performance measure

The performance indicator is the consistent evaluation of outcomes that provides trustworthy data on the effectiveness of the suggested strategy. In addition, the process of notifying, obtaining, and evaluating information regarding the effects of the assaults is indicative of efficacy. The confusion matrix used for assessing the classifier categorical data and its definition are summarized in Table 2.

Table 2 Confusion matrix

True positives are occurrences that have been appropriately labeled as such. The FP implies that the true positive classification is inaccurate. Similar to FN, which are also real negative events that have been misclassified as positive, TN are also true negative events that have been misclassified as false positives.

The mathematical equations for the estimation of the performance parameters are given as:

$$Accuracy\left( {ACC} \right) = \left( {TN + TP} \right)/\left( {TP + TN + FN + FP} \right) \, \times {1}00$$
(17)
$$F{ - }measure\left( {F{ - }M} \right) = 2TP/\left( {\left( {2TP + FP + FN} \right)} \right) \times 100$$
(18)
$$Precision \, \left( {PR} \right) = TP/\left( {\left( {FP + TP} \right)} \right) \times 100$$
(19)
$$Recall \, \left( {RC} \right) = TP/\left( {\left( {FN + TP} \right)} \right) \times 100$$
(20)

The existing techniques, such as KNN [23], CNN-LSTM [24], Bi-LSTM [28], HDRaNN [26], and proposed technique are tested with the dataset, namely DS2OS, to reveal the effectiveness of the proposed model. The results, as obtained with existing and proposed techniques, are shown in Figs. 4, 5, 6, 7, 8, 9, 10 and 11.

Fig. 4
figure 4

Percentage F-measure for 60% training data

Fig. 5
figure 5

Percentage recall for 60% training data

Fig. 6
figure 6

Percentage accuracy for 60% training data

Fig. 7
figure 7

Percentage precision for 60% training data

Fig. 8
figure 8

Percentage F-measure for 80% training data

Fig. 9
figure 9

Percentage recall for 80% training data

Fig. 10
figure 10

Percentage accuracy for 80% training data

Fig. 11
figure 11

Percentage precision for 80% training data

Figures 4, 5, 6 and 7 consider the training data of 60%, whereas Figs. 8, 9, 10 and 11 consider the training data of 80% from the given datasets. It is evident from Figs. 4, 5, 6 and 7 that the proposed optimized approach outperforms the other existing techniques. The F-measure of the proposed model is 98.97%. The smallest value of F-measure is provided by KNN [23]. The highest value of F-measure, as obtained with the proposed optimized approach, indicates that the proposed model has a good tradeoff between precision and recall and is hence proficient in detecting and classifying assaults with high exactness. Further, the FPR of the proposed system will also be small due to the high value of the F-measure. In a nutshell, it can be said that the high F-measure, as obtained with the proposed optimized approach, will have the following attributes:

  1. 1.

    Improved threat recognition

  2. 2.

    Small FPR

  3. 3.

    Ability to detect genuine assaults

  4. 4.

    Improved reliability

  5. 5.

    Quick detection of the genuine assaults as low value of FPR

Additionally, in order to meet industrial requirements for cyber security, high recall percentages are important. It is also required to avoid legal issues. Further, the high value of the recall percentage is an indicator of the capability of early protection from advanced threats. Hence, the proposed model is also suitable for detecting advanced threats with increased confidence due to the high recall percentage. Further, the F1 scores of the existing techniques and proposed techniques are estimated and shown in Fig. 12.

Fig. 12
figure 12

Outcome’s comparison of different techniques for the index, ‘F1-Score’

The index, 'F1-Score', includes precision and recall in the single matrix. This index is often used to reveal the effectiveness of the model. A high value of the F1-score indicates that the model has a high capability of minimizing both errors, namely ‘FP’ and ‘FN’. When the balance between precision and recall is highly desirable, a high F1-score is desirable. It is obvious from Fig. 12 that the proposed model has an F1-score of 0.96, closer to unity, and hence the proposed model provides a more accurate characterization of the assaults than the other existing techniques. Further, from Figs. 8, 9, 10 and 11, it is evident that, by increasing the training and testing values, the performance of the proposed model can be increased. In a nutshell, it can be said that the proposed model is robust, reliable, and has a good tradeoff between recall and precision. Hence, it can be used in industrial applications.

6 Conclusion

A novel pre-training method, using the hybridization of DNN and DAE has been developed in this work for the fast detection of assaults with increased accuracy and a reduced false rate. The proposed method delivers an alternative to deep learning construction replicas through an HPO procedure incorporating the Archimedes optimization algorithm (AOA). The various existing techniques and proposed techniques are tested with the dataset, namely DS2OS, to reveal the effectiveness of the proposed technique. The performance parameters, such as Accuracy, Precision, Recall, F-measure, and F1-score have been estimated with a common dataset. Through the comparative analysis of the results, it is shown that the proposed model provides more accurate results than the other existing techniques. The value of the percentage F-measure of the proposed technique is found to be the highest, i.e., 98.97%. The highest value of F-measure, as obtained with the proposed model, indicates the proposed model has improved threat recognition with a small FPR and greater reliability. Further, the F1 score of the proposed model is found to be nearly equal to unity. The high value of the F1-score of the proposed model shows the following attributes over the other existing techniques:

  1. 1.

    A good trade-off between recall and precision

  2. 2.

    Higher reliability than the other existing techniques

  3. 3.

    Smaller value of FPR than the other existing techniques

  4. 4.

    Faster detection of assaults than the other existing techniques.

  5. 5.

    More cost-effective than the other existing techniques due to the detection of only real attacks.

Furthermore, increasing the training datasets can enhance the performance of the proposed model. Therefore, taking into account the aforementioned attributes, we can use the proposed optimized approach for industrial applications with good confidence and reliability.