Introduction

Proteins are essential to organisms and participate in every process virtually within cells. Despite the wide range of functions, all proteins are made out of the same twenty-one building blocks called amino acid (AAs), but combined in different ways. AAs are made of carbon, oxygen, nitrogen, and hydrogen and some contain sulphur atoms. These atoms form amino groups, a carboxyl group, and a side chain attached to a central carbon atom as shown in Fig. 1. The side chain determines the AA’s properties and this is the only part that varies from one AA to another AA.

Fig. 1
figure 1

Structure of amino acid

Two AA molecules can be covalently joined to a substituted amide linkage termed as peptide bond and it returns a Dipeptide [1]. Such a linkage is formed by the removal of the elements of water i.e. dehydration from the alpha-carboxyl group of one AA and alpha-amino group of another AA as depicted by Fig. 2. Similarly, three AAs can be joined by two peptide bonds to form tripeptide and four to form tetrapeptide, and so on. When many AAs are joined in this fashion, the product is called a polypeptide. An AA in a peptide is often called a residue i.e. the part left over after losing the water. Protein may have 1000 s of AA residues. Generally, the terms protein and polypeptide are used interchangeably. Molecules referred to as Polypeptide have a molecular weight (MW) below 10,000 daltons and those called proteins have higher MW.

Fig. 2
figure 2

Formation of peptide bond

Proteins usually do not function alone, they need a partner to accomplish their functions. The partner may be DNA, RNA, or proteins. If a single protein is present inside the cell it is not that functional but together all the proteins are functioning with themselves. And if a protein interacts with another protein, or if two or more proteins are cross-talking with each other by some signaling processes, it is termed as protein–protein interactions (PPI) [2]. Protein control and mediate many of the biological activities of the cell by these interactions. For e.g. Muscle contraction (is possible due to PPI between active myosine filaments), cell signaling, cellular transport (molecule coming out and going inside the cell using PPI) [3]. So PPIs play a vital role in many cellular processes.

However, disruption or formation of abnormal interactions can lead to a disease state. This drives many researchers to predict PPI at the early stages of the disease symptoms. As some of the diseases show their symptoms in the later stage of the disease which may be lead to complexity in medication or may be deadly. Prior information about PPIs can offer a clear vision to detect drug targets, further biological processes, and new remedies for diseases [3]. Compared to the investigational methods, such as tandem affinity purifications (TAP) [4], protein chips [5], and efficient biological methods, computational approaches are revealing better exposure for PPIs prediction, as they are less time-consuming and more proficient [6].

Machine learning (ML) methodologies to predict PPIs govern most of the computational methods [7, 8]. Framing a suitable feature set and selecting favorable machine learning algorithms are two major stages for prosperous predictions. The feature set can be constructed wisely in such a way that they could cover the maximum information or key features from the structure of the proteins. Among the structures, the primary structures i.e. the sequences of the protein are the most common to work on because of the huge data availability [9]. Several feature extraction methods have been developed in the past for representing the protein information in numerical form that are widely used to possibly extract protein interaction information [10,11,12,13,14,15]. For the PPIs prediction purpose, each feature extraction algorithm requires a favorable classifier to appropriately classify the interaction or no interaction according to the feature sets. Various classification algorithms have been developed like RF, SVM and their derivatives [16], gradient boosting decision trees [17], and ensemble classifiers [18].

Recently, DL technology has come into the limelight with numerous scientific researches that help in many applications like image recognition [19], speech recognition [20], machine language translation [21], computer vision [22], and many more. In DL, specifically, DNNs, RNNs and CNNs have contributed a lot in real-life applications and ease human efforts. Numerous noteworthy DL-based researches are being published in the field of bioinformatics [23, 24].

This paper focuses on some DL approaches using in the PPI prediction task, in the successive sections, a short name is used as deep networks (DNs) to represent DNNs, CNNs and RNNs and its variants.

The aim of this paper is to provide a comprehensive survey of DN applications in the field of PPI prediction. In this review, the recent progress in applying DN techniques to the problem of PPI prediction is summarized and discussed the possible pros and cons. The scope of this paper is limited to the primary structure of the protein i.e. the sequence-based PPI prediction with DNs. The significance and the approaches to represent protein sequence based on DN are discussed for the first time. The central importance of proteins’ primary structure is also emphasized.

Therefore, the paper is organized as follows: “Introduction” section presents the outline about the protein, importance of PPI, several methods to detect PPI, and recent advancement of computational approaches in the field of Bioinformatics. “Outline of Deep Networks” section familiarizes the concept of DNs and how DNs can be proved beneficial in PPI prediction. “Approaches for sequence-based Protein–Protein Interaction Prediction using Deep Networks” section illustrates the various research publication of sequence-based PPI prediction using DNs along with their pros and cons and performance achieved. “Implementation of Cited Papers” section presents the manual implementation of cited papers. In the succession to analyze the adeptness of DNs in PPI prediction, a fair comparison is made in “Comparison with State-of-the-art Methods” section with State-of-the-art methods. At last, the paper is concluded with future aspects in this area. This review is focused to help both computational biologists to achieve familiarity with the DN methods applied in protein modeling, and computer scientists to expand perspective on the biologically significant problems that may help from DL methods.

Outline of Deep Networks

Deep learning architecture can be understood as the ANNs with several layers and researchers have contributed several types of DL architectures based on the considered input and purpose of the particular research. This review mainly considers three DL architectures: DNNs, CNNs and RNNs. However, several researchers included all DL architectures in DNNs [25, 26]. This paper considers ‘DNNs’ to discuss specifically SAE [27] which use AEs [28] as the elementary units of NNs [29]. The reason behind these considerations is the limited scope of this paper which mainly focuses to deliver the significance of DNs using sequential information of the input data of PPI for the prediction task.

Generally in DL architectures, there are two principle elements that lift up the performance: Optimization and Regularization. The target during training is to optimize the weight parameters in each layer so that the important and relevant features can be learned from the input by filtering out the irrelevant information and transfer an abstract form or reduced number of features to the next layer. The optimization procedure follows an algorithm to update the weight parameters based on the SGD [30]. Regularization is a process to evade over-fitting problem which usually occurs while training. Some regularization processes have been developed like weight decay [31], Dropout [32], rnnDrop [33]. Recently, a novel regularization technique has been proposed [34], which operates in batches by doing the normalization of features.

The following part of this section gives a brief knowledge about three DL approaches DNNs, RNNs and CNNs that have greatly contributed to the prediction task of PPIs using sequential information only.

Deep Neural Networks

A DNN, in simple words, is a network that is deep i.e. which has many hidden layers along with the input layer and an output layer as shown in Fig. 3. For the given input data, the outputs are sequentially calculated with the layers of the network. The input vector at each layer includes the output of the previous layers’ unit which are then multiplied by the weight vector of the considered layer that resulted in the weighted sum. The output of a particular layer is computed by applying some non-linear function (ReLU, sigmoid, etc.) [35] to the weighted sum which results in more abstract representations from the previous layer output as follows [36]:

Fig. 3
figure 3

Basic structure of DNNs with input units I, three hidden units h1, h2 and h3, in each layer and output units O. At each layer, the weighted sum and non-linear function of its inputs are computed to obtain an abstract representation

$${p}_{x}^{(O+1)}= \mu ({w}_{x}^{\left(O+1\right)}{p}^{O}+{z}_{x}^{(O+1)}$$
(1)

where \(\mu\) represents activation, w is the weight matrix, \({p}^{O}\) is the inputted data for the Oth layer and z is the bias term.

DNNs work very well for scrutinizing high-dimensional data. Good researches in bioinformatics cannot be completed with small data, therefore the data available in this field is usually high-dimensional and complex and thus DNNs guarantee favorable opportunities for the researchers to work in. DNNs have the potential to give knowledge to more readily comprehend by extract the highly abstract and related information from the data. Though the raw data is the only requirement for DNNs to learn graded features, manually crafted features have frequently been given as contributions. This concludes that the abilities of DNNs have not yet completely been taken advantage of. It is believed that the future advancement of DNNs in bioinformatics will come from examinations concerning appropriate approaches to encode crude information and take in reasonable features from them.

Recurrent Neural Networks

The structure of RNNs has a recurring link in each hidden layer which is responsible to operate sequential information by some recurrent computation as shown in Fig. 4. The previous output (state vector) is kept in the hidden units and for the current state, the output is calculated using the previous state vector and the considered input [37]. The following two equations express the evolvement of RNN over time [38]:

Fig. 4
figure 4

Basic structure of RNNs with an input unit I, a hidden unit h and an output unit O. The recurrent computation can be expressed more explicitly if the RNNs are unrolled in time. The index of each symbol represents the time step. In this way, ht receives input from It and ht–1 and then propagates the computed results to Ot and ht+1

$${O}_{t} =\delta \left({h}_{t};\, \theta \right)$$
(2)
$${h}_{t} =g\left({h}_{t-1}, {I}_{t} ;\,\theta \right)$$
(3)

here, \(\theta\) includes weights and biases for the network, the first equation express the dependency of the output \({O}_{t}\) at time t only with the hidden layer \({h}_{t}\) using some computation function \(\delta\) and the second equation shows the dependency of the hidden layer \({h}_{t}\) at time t with that of \({h}_{t-1}\) at time t-1 and the input \({I}_{t}\) at time t.

RNNs specifically BRNNs are popularly used in applications where previous information is required for the current output (as shown in Fig. 5) like speech recognition, Google translator, etc. The appearance of RNN structure is simpler than DNNs in terms of the number of layers, but if the structure of RNN is unrolled with time, it is even deeper.

Fig. 5
figure 5

Basic structure of BRNNs unrolled in time. For each time step, there are two hidden layers. The information from both hidden units is propagated to Ot

Though, this leads to two popular hindrances: vanishing gradient and long-term dependencies, researchers have been overcome these issues by adding some complex units and develop some variants of RNNs, like LSTM, GRU. Today, RNNs have been utilized effectively in numerous domains including NLP and language interpretation [39,40,41,42]. The nature of identifying the PPI is practically identical to the modeling tasks undertaking in researches of NLP as the two of them intended to analyze the shared impact of two arrangements dependent on their underlying features. Proteins are reported in grou**s with a more preserving manner, also a bigger scope of lengths. Therefore, accurately covering the PPI not only requires significantly more extensive learning to strain the important and relatable features from the whole sequences but also retain the long-term ordering information. If the PPI prediction task and the working of considered DNs are carefully observed, then it can be concluded that these DL architectures can contribute a lot to the considered prediction tasks and could be the emerging area for researchers.

Convolutional Neural Network

Convolutional neural network is a branch of Deep Learning algorithm which can take an input in the form of image, allocate learnable weights and biases to various features of the image and be able to distinguish one from the other with the minimum pre-processing requirement as compared to other classification algorithms [43]. The structure of CNN is basically a feed-forward neural network whose neurons can retort to the nearby units in a part of the coverage and have outstanding performance for data feature extraction [44]. The output value is computed using forward propagation and weights and biases are adjusted using back propagation. Figure 6 shows the structure of CNN comprises of the input layer, the convolutional layer, subsampling layer, full connection layer and the output layer.

Fig. 6
figure 6

The baseline structure of CNN

The feature map Ml at lth layer is computed as [44]:

$$M_{l} = f(M_{l - 1} \circ w_{l} + b_{l} ),$$
(4)

where wl is the weight matrix of the convolution kernel of lth layer, bi means the offset vector, f represents the activation function and operator ° denotes convolution operations. The subsampling layer usually behind the convolutional layer and the feature map is sampled according to given rules. Suppose, Ml is a subsampling layer, its sampling formula is:

$$M_{l} = {\text{subsampling}}(M_{l} - 1).$$
(5)

The fully connected layer is responsible for classification of the extracted features via several convolution and sub sampling operations. The fundamental mathematical notion of CNN is to map the input matrix Mo to a new feature representation R through multi-layer data transformation.

$$R(l) = {\text{Map}}(C = c_{l} |M_{O} ;(w,b))$$
(6)

where cl represents the lth label class, Mo denotes the input matrix, and R denotes the feature expression. The goal of CNN training is to minimize the network loss function R (w, b). At the same time, to ease the over-fitting problem, the final loss function Z (w, b) is usually controlled by a norm, and the intensity of the over-fitting is controlled by the parameter €.

$$Z(w,b) = R(w,b) + \frac{\EUR}{2}{w^T}w.$$
(7)

Numerous research papers have been published in the discussed domain. In the next section, the related papers are briefly discussed along with their objectives, approaches, considered dataset, and performance measures.

Approaches for Sequence-Based Protein–Protein Interaction Prediction Using Deep Networks

To the best of our knowledge, to date, there are around 30 research papers have been published for PPI prediction using DNs that are using sequence information as input. The same is also depicted by the publication analysis of sequence-based PPI prediction using DNs in Fig. 7. This section details all the studies performed on PPI prediction tasks using DNs so far. The summary of the same is also provided in Table 2. Out of 30, four papers are based on identifying PPIs using biomedical text dataset which is a part of the Biomedical Natural Language Processing (BioNLP) [45] community, and the remaining are using physical protein pair interaction datasets. Therefore, the studies are classified on the basis of: year of publication; Research objectives; Approach to predict PPIs; Types of the dataset used; and Hyperparameters of the network. The term ‘Strategy’ written after each section is used to indicate the category of approach in the table. All the important abbreviated terms of the table are provided in expanded form in the corresponding text, whereas the basic abbreviations are provided after the abstract. The detailed description of this section is broadly divided on the basis of the dataset used. For better understanding, an abbreviated form mentioned in Table 1 is used for the dataset considered by the cited paper in subsequent sections.

Fig. 7
figure 7

Publication analysis of PPI prediction approaches using DNs

Table 1 Short names given for datasets considered by cited papers
Table 2 Publication analysis of DN approaches in prediction of sequence-based PPIs

Prediction Using Paired Protein Interaction Dataset

Some scholars proved that the DNs are capable enough to capture the potential features from the input protein raw data while some researchers include the hand-crafted features with DNs to enhance the performance of PPIs prediction tasks. Therefore, this sub-section is again categorized according to the inclusion and exclusion of manual feature engineering.

Strategy-A: Inclusion of Manually Crafted Features

The most important factor to develop a computational technique for the prediction of PPIs is to mine extremely preferential features that can well define proteins. Several publications proposed novel methods for representing the protein information in numerical ways as shown in the Table 3 which are popularly used by several publishers to produce proficient methods that can extract the protein interaction information more finely.

Table 3 Intuition behind some popular manually crafted features used by cited papers under Strategy A

The use of DL algorithms in sequence-based PPIs prediction task began from 2017 [46] by proposing the use of SAE to filter the heterogeneous features in the low-dimensional space. The protein sequences were numerically represented using AC and CT methods which were then fed to the model for training with tenfold CV. The author observed that with a one-hidden layer, both the AC model having 400 neurons and the CT model with 700 neurons attained the best performances and concluded that the prediction performances of the model do not depend on the number of neurons and layers. Then for the final model construction, they took AC because of its better performance and trained with the entire benchmark dataset, finally compared the results with the previous ML approaches that used the same dataset. Following the similar pattern, Du et al. [47] employed five widely used descriptors to represent protein sequence which is then effectively learned by a DNN model named DeepPPI. The author later showed the performance of DeepPPI using two different network architectures: one by connecting the two inputs in a solo network; another using two networks for each protein separately. The evaluation of the predictor did after setting the best hyperparameters for the network and compared the obtained results with existing approaches. The training time of DeepPPI is better than SVM, AdaBoost, and RF. Further, in this trend, Wang et al. [48] predicted the PPIs by inputting a protein feature vector, which is a combination of the proposed MOS descriptor with AA classification, into a DNN. Unlike previous protein representor like AC, CT, LD, the proposed MOS descriptor has a characteristic to consider the order relationship of the whole AA sequence. The author gave suitable reasons for opting the network parameters for the task like ReLU AF, ADAM optimizer, and cross-entropy as cost function. The other parameters like network depth and width and the LR were computed for the particular method by varying their range and selected the best ones. And finally, the author trained the DNN model with AC, CT, and LD separately and compared their performance with the proposed DNN-MOS model on the benchmark dataset as well as the non-redundant dataset. Subsequently, Guo et al. presented a DL framework based on the properties of AA that contribute to the PPI information [49]. First, a feature vector was created according to the proposed descriptor named conjoint AAindex modules (CAM) which basically encodes a conjoint AA unit of protein sequence according to the AAindex database and repeating the same process for the whole protein sequence to generate a sequence profile. To scrutinize the CAM patterns from the sequence profile, multiple dense operators were employed, and then ReLU function is activated to introduce non-linearity. Finally, the LSTM layer was stacked to leverage the advantage of holding the long-term order dependencies and applied logistic regression to compute the results.

Following the same fashion of introducing the novel feature generation, Yao et al. [50] combined the DL with representation learning (RL) [51] to predict PPI. The purpose to include RL was to learn the data pattern automatically from the raw data, the resultant informative representation then utilized by the considered DL model. The author proposed a DeepFE-PPI framework that basically utilizes the benefits of RL to represent the informative representation using Res2vec (inspired by word2vec) and benefits of DL by extracting effective features using the hierarchical multi-layer architecture and classify the PPI task. DeepFE-PPI used two separate DNN modules to squeeze out latent features from two embedding vectors and a joint module for PPI classification task via softmax function. Like Wang et al. [48], the author also selected the best-suited hyperparameters of the DL model for PPI prediction by analyzing the range of protein length, residue dimension, network depth, and protein length. Along with the standard performance measures; the author also compared the training time with different existing algorithms using the most optimized network parameters and concluded that the DeepFE-PPI holds the fourth position among SVM, DT, RF, NB, KNN, logistic regression and though the fastest algorithm is NB, their results are comparatively poor.

Inspired by the working and advancements of DNNs as wells as the characteristics of different feature extraction methods, Zhang et al. introduced EnsDNN, an ensemble DNN-based approach for PPI prediction [72]. The author did efforts in dataset set up because of the scarcity of suitable data due to the new disease. Also according to the author, this algorithm-based map** is the first approach in this field. This proposed algorithmic approach made use of the AVL tree because of its fast search processing and balancing properties. To generate an AVL tree, first, the one-letter code of each AA was considered and arranged in alphabetical order and by following the insertion and deletion rules of a balanced AVL tree, the final structure was obtained. Then, the depth value of each AA was determined and converted to every AA sequence accordingly in its numerical form. Because the author compared the proposed map** method with the other existing ones, the input sequences were mapped accordingly using every map** approach which then underwent a normalization process. The obtained result was then fed to a DeepBiRNN for the classification. The structure of considered DeePBiRNN was: first-three layers are BiRNN with ReLU AF and the number of units were 64,32,16 respectively; followed by Flatten, Batch normalization and Dropout function; next two FC layer. The resultant performance was favorable with this novel algorithmic map** process.

A notable experiment done for improving the performance of CNN model in PPI tasks by proposing an encoding technique [73]. The proposed Sequence-Statistics-Content is basically three-channel format method which is able to present more refined features and decrease the effect from local sequence similarity. The output of SSC, the statistical information and bigram encoding information of protein sequence, were then fed to the 2D CNN using 2D convolutional kernels that offer ample features instead of the distinct features of one hot encoding. The author then evaluated the performance using different datasets and compared the results with existing approaches. Additionally, the effect of different SSC channel combination were also shown by the author. The overall results provide a valuable insights for DN in PPI prediction task.

Figure 8 presents the best performance in terms of accuracy with the most suitable parameter settings of the various aforementioned DN approaches to predict PPIs. The performance measures by some papers [72] are either multiple or unclear, therefore, those approaches are not considered in the figure. It can be observed that approaches by [58] and [69] are performing well using Benchmark dataset and H. pylori dataset.

Fig. 8
figure 8

Performance analysis of highest accuracy reported by various approaches of Strategy-A (in %). The dataset name is mentioned in bracket alongwith the accuracy (best). Approach used by [69] is performing best using ‘k’ dataset

Strategy-B: Auto-Feature Engineering based PPI Prediction Approaches

To our knowledge, the first research on sequence-based PPI prediction using DNs that solely based on auto-feature engineering i.e. without the inclusion of manually extracted features was presented by Li et al. in the year 2018 termed as DNN-PPI [74]. For the NN architecture to learn the data, the input should be in numeral form. Therefore, the author assigned each AA a natural number randomly and accordingly converted the protein sequence. Within the proposed framework, the embedding layer captured the information regarding semantic association among AA, position-based features of protein sequences were bagged by three-layered CNNs, and short as well as long-term dependencies were covered by the LSTM layer and then the concatenated features were then fed to the FC layer with dropout to identify potential features. Besides the favorable results of DNN-PPI, the author also tested the performance by changing the number of CNN layers to 1 and 2 and concluded with no significant difference in terms of accuracy but had speedy convergence in loss with the higher number of layers. Further, Gonzalez-Lopez et al. [75] performed PPIs prediction through embedding systems and RNNs and bypass the need of feature engineering. The tokenization process was used to represent the sequence into numerical form by assigning a token (an integer) to every triplet in the sequence. In the NN, each protein’s representation of the pair was fed and processed separately in two branches having similar architecture. The embedding, recurrent, and FC layers used in the architecture performed their specific roles. Along with this, two important parameters Dropout and Branch normalization were also used to avoid over-fitting and input standardization. Moreover, the schemes like early stop** and Reduce LR when stagnation was also considered to avoid wasting resources and to achieve better local minima. The observation from the results obtained by evaluation with different datasets is that the performance of the proposed DeepSequencePPI approach is similar to other existing methods which were using hand-crafted features with DL approach and thereby concluded that if sufficient data is available, then DNs could properly model PPI prediction task without the inclusion of manually created features.

To handle huge training data with effectively capture the potential features of protein pairs, a remarkable DL approach (DPPI) was implemented by Hashemifar et al. [76] having the generalization characteristics to be easily used for different applications with slightly tuning the parameters. The successful execution of three main modules is contributed to the design of the DPPI model. The first and core module is the Convolutional module consists of a set of filters (convolutional layer, ReLU, batch normalization, and pooling layer) responsible for map** the protein sequences to the representation suitable for further processing by detecting pattern that characterizes the interaction information. The input in DPPI was taken as the sequence profiles, which was generated on the basis of probability using the PSI-BLAST algorithm. The next module is Random Projection (RP) consists of two FC sub-networks and is responsible to project the convoluted representation of two proteins to two different spaces. The word ‘random’ is used for taking the random weights so that model could learn motifs with different patterns. The outcome of the RP module is the refined representation of the proteins which are then taken as the input by the last module: The Prediction Module. The Prediction module computes the probability score by performing the element-wise multiplication on the representation taken from the previous module which indicates the interaction probability of two proteins in a pair. This Siamese-like convolutional NN behaved very well when evaluated with different benchmark datasets. The author committed that DPPI can serve as a principle model for sequence-based PPIs prediction and is generalizable to diverse applications.

Another effective approach PIPR [77] to capture the mutual influence of the protein pairs in PPI prediction was implemented by Chen et al. based on Siamese architecture. Besides binary prediction, PIPR was designed to address two more challenging tasks: estimation of binding affinity and prediction of interaction type. PIPR incorporates a deep Siamese environment of residual RCNN-based protein sequence encoder to better apprehend the potential features for PPI representation. This deep encoder was comprised of many occurrences of convolution layers with pooling and bidirectional residual gated recurrent units so as to ease the training and greatly diminish the updates of the parameters. For the numerical representation of the protein sequences, PIPR transformed the recognized AAs based on their similarity in terms of their co-occurrences as well as their electrostatic and hydrophobic properties and pre-trained the obtained embedding. The resultant AA embedding was then fed to the encoder to capture the latent information of the proteins in a pair. The output of the encoder is a refined embedding to two sequences which are then merged to generate a pair vector and passed to an MLP with Leaky ReLU [78] for PPI classification. The whole learning tasks were optimized by mean-squared loss for the estimation task of binding affinity and Cross-entropy loss for the remaining two tasks. PIPR proved promising results with effectively covered the mutual influence among the protein in a pair and ascertained the generalization with the satisfactorily results in all three challenging tasks without the inclusion of hand-crafted features.

Richoux et al. designed and compared two DL models: a FC model and a recurrent model intended to show the downsides which are needed to avoid while predicting PPIs [83] intended to address the limitation of training data size as well as improving generalization across species. D-SCRIPT (Deep Sequence Contact Residue Interaction Prediction Transfer), a DL method was proposed with a hypothesis that if a model, that is to be trained using sequential data, have favorable input features of protein that strongly characterizes the interaction information and well-designed model structure; can be able to generate a representation that depicts the behavior of structural interaction. D-SCRIPT model design is very similar to PIPR [76] and DPPI [77] with the inclusion of impression of protein structure. First, using the concept of Bepler and Berger’s pre-trained model [72] used BLAST algorithm which does pairwise comparison for finding sequence similarity [87].

Strategy-C: Prediction Using Biomedical Text Dataset

The first implementation in this category is by Hsieh et al. [88]. The author implemented the PPI identification task using a bi-directional RNN with an LSTM approach. The method includes three layers in the scenario: embedding layer which takes the protein entities in sentence form and each of its words is converted to the corresponding embedding which forms a low-dimensional vector containing real-values. Basically, this layer bagged the syntactic and semantic information by taking the effects of neighboring words. The obtained vector representation is then fed to the recurrent layer, more specifically a Bi-RNN. The resultant contextual and more refined information obtained by Bi-RNN are then taken by a FC layer for PPI classification. The author adopted two testing methods tenfold CV and cross-corpus (CC) to evaluate the performance using the two largest PPI corpora: a and c and concluded with favorable results in the CV that DNs are more suitable for extracting rich context information from larger datasets rather than manual feature engineering.

In the very next year, a remarkable work in this domain was published by Yadav et al. [91] and AE. Then, an embedding layer is used in which the embeddings of SDP, POS, and position are concatenated to generate a vector representation suitable for the Bi-LSTM as input. Further, Bi-LSTM comprises of three layers: Sequence, Max-Pooling, and MLP layer which are responsible for eliminating noise and capture contextual and maximum possible feature-rich information from the obtained embedding and make the PPIs prediction accordingly. The model was evaluated on two popular corpora and concluded with favorable results.

The same group of authors [92] implemented the same task with slight modifications in the model. They include an attention layer and used a stacking strategy in the Bi-LSTM unit. The remaining work and architecture are same as [89]. The LSTM model with multiple hidden layers having numerous memory units is termed as stacked LSTM. The author employed the vertical stacked LSTM to capture a high-level abstract demonstration of every word in the sentence. The output of this layer is the hidden state representation of its last layer which are then taken as inputs to the attention layer. The goal of the attention layer is to generate the clues that can be a deciding factor of interaction information or in a more simple words, it tells that how much attention is to be given to a particular word at the present state. It is computed by multiplying some attention weights to the obtained hidden representation. The model was evaluated on five benchmark corpora and concluded with a significant improvement over [89].

Besides basic LSTM that can only be used for investigating sequential information, tree LSTM (tLSTM) [93] can be a better option for scrutinizing extra information. Ahmed et al. [94] established his PPI identification work on tLSTM and traversed the PPI-related sentences through the network topology of tree-like structure in such a way that each unit of tLSTM is accomplished to gain information from its children. Additionally, to build the final model, the author fused the output vector obtained from tLSTM to an attention mechanism to calculate the strength of attention at each unit. This fusion of tLSTM with structure attention mechanism was evaluated on five PPI corpora including large and small corpora and outperformed the traditional comparative approaches. It was also observed that due to different distribution, fewer syntactic dependencies were captured, and thereby the model with attention mechanism was performing poorly than the model without attention scheme.

Figure 10 depicts the analysis of best performance achieved by various approaches mentioned under this strategy. The details of these measures are mentioned in the Table 2. It can be clearly observed from the figure that the inclusion stacking strategy and attention layer in [92] greatly enhanced the performance using a copora and also proved superior to the other competitive approaches.

Fig. 10
figure 10

Analysis of highest performance reported by cited papers under Strategy-C (in %). The attention layer approach used in [92] performed best using corpora ‘a

Figure 11 presents the count of papers published using particular strategy. It can be witnessed that although DNs are known for their auto-feature engineering capability but still there are a lot more to discover because numerous researchers are taking the help of hand-crafted features with DNs for improving the performance.

Fig. 11
figure 11

Categorization of number of published papers according to Strategy

Implementation of Cited Papers

This section presents the implementation results of two papers among the cited papers. One paper is taken from Strategy-A [61] that employed a hybrid classifier (DNN-XGB) approach along with the combination of three feature extraction methods namely AAC, CT and LD. The implementation was done on two datasets k and r. For this, all three features were extracted separately for each datasets. Then, two files were generated for combined positive features and combined negative features of AAC, CT and LD. Lastly, these two feature files were used by the hybrid classifier for the prediction result. The implementation result are as shown in the Fig. 12. This work was implemented on environment of 8 GB RAM and ×64-based processor using MATLAB R2016a [95] software for feature generation and keras [96] library of Python 3.8.2 was used for classification.

Fig. 12
figure 12

Performance analysis of manual implementation of approaches employed by [61, 75]. A: Implementation of [61] on k dataset; B: Implementation of [61] on r dataset; C: Implementation of [75] on r dataset

Second paper is taken from Strategy-B [75] that advocated the auto-feature engineering for PPI prediction. The implantation was done on r dataset using Google Colaboratory [97] environment enforcing keras library of Python 3.8. The fasta file [98] of AA sequence in taken online for tokenization and generation of n-gram dictionary. The obtained results are as shown in the Fig. 12.

The details of performance measures are mentioned in the cited papers. The observations from the Fig. 12 are that although DL architectures are known for their auto-feature engineering capability but still there are a lot more to discover because numerous researchers are taking the help of hand-crafted features with DL for improving the performance like in [61]. If the nature of DL architectures is deeply studied, like the authors in [75] did, and applied according to the problem taken then the need and effort of generating protein feature can be easily bypassed.

Comparison with State-of-the-art Methods

For better understandability of the enriched improved performance of PPI prediction using DNs, a comparison of some discussed approaches are made in this section with the state-of-the-art methods proposed for the same. Table 4 shows the best-reported results of various existing approaches suggested for the sequence-based PPI prediction in which the author used AC [13], ACC [13], CT [10], LD [11], MCD [15], MLD [14] and their combinations [99] with different ML-based classifiers. Some exciting approaches like phylogenetic bootstrap [100], hyperplane distance nearest neighbor algorithm (HKNN) [101], ensemble of HKNN [102], K-local signature products [54] were also proposed. This can be clearly observed from Table 4 that the DNs are now a well-suited selection for the problem taken with favorable outcomes.

Table 4 Comparison of the deliberated approaches with state-of-the-art methods

Conclusion

Recently, DL technology has come into the limelight with numerous scientific researches and has also become a hot topic in business applications. In the area of bioinformatics, where incredible advances have been made with ML, promising and more significant outcomes are expected by DL. This paper provides a comprehensive review of three architectures of DL: DNNs, CNNs and RNNs including its variants in the domain of PPI prediction using sequence information and broadly discussed the various approaches in terms of input data, objectives, and structure of the DL architecture along with their best-suited parameters.

It is observed that all considered architectures are capable to provide effective results in the considered area but to fully utilize of competencies of these approaches; there still remain several budding challenges like inadequate data, opting for the suitable architecture with favorable hyperparameters, and many more. Also, advanced and deep study is essential to scale up the popularity of DL approaches. Therefore, the detailed discussion presented herein with carefully mined every possible information can help the researchers to further explore the success in this area. It is believed that this literature survey will bring a treasured vision to assist the scholars in the applications of DNs in PPI prediction in imminent research.