
In recent years, the association of miRNAs with complex human diseases has been a research focus from a wide range of researchers. A large amount of data has been generated in the course of research, and researchers have established a large number of related databases, such as HMDD [1], miR2Disease [2], dbDEMC [3], miRCancer [

Materials and methods

Method overview

In predicting the potential miRNA–disease assocation, a new prediction model KATZNCP was proposed, which consisted of three stages. The detailed inference steps are shown in the flowchart in Fig. 1.

Fig. 1
figure 1

The overall architecture of KATZNCP

Step 1 Data preparation. First, the known miRNA–disease association prediction data and the disease semantic similarity data were downloaded from relevant databases. Then, miRNA functional similarity relationships and Gaussian interaction profifile kernel similarity relationships were calculated. Finally, the integrated disease similarity network and integrated miRNA similarity network were constructed.

Step 2 Association score estimation prediction. Three heterogeneous networks of known miRNA–disease association prediction data, integrated disease similarity network, and integrated miRNA similarity network were constructed as one network. The KATZ algorithm was implemented to obtain the estimated miRNA–disease association prediction scores.

Step 3 Association score refinement prediction. The integrated disease similarity network was projected into the prediction network. The integrated miRNA similarity network was projected into the prediction network. The two results were weighted to obtain the final miRNA–disease association prediction scores.

Known miRNA–disease associations

In order to fairly evaluate the performance of the models. Benchmark datasets were employed during the experiments. Specifically, the known miRNA–disease associations dataset was downloaded from HMDD v2.0 ( a result, 5430 clinical or experimental verified miRNA–disease associations between 495 miRNAs and 383 diseases were obtained after screening. Detailed associations were represented by a Boolean matrix MD, if there is an association between miRNA \({\text{m}}_{{\text{i}}}\) and disease \({\text{d}}_{{\text{j}}}\), corresponding value MD (i,j) would be set to 1, otherwise set to 0.

Semantic similarity calculation of disease

According to the hierarchical information of diseases in MeSH (Medical subject Headings) [1], the relationship between different diseases can be described as a directed acyclic graph (DAG). For any disease d, it’s DAG could be represented as DAG(d) = (N(d), E(d)), where N(d) represents the disease d’s ancestor node set (including disease d itself), E(d) represents the related connection. Many scholars use this as a basis to calculate the similarity between diseases. Wang et al. [70] proposed a disease similarity calculation method based on semantic information which accepted an assumption that if two diseases share more disease (common ancestor) entries, the similarity between the two diseases will be greater. At this time, the contribution value of disease \(d^{\prime}{\text{s}}\) ancestor node \({\text{d}}_{{\text{a}}}\) to disease d was expressed by the following formula:

$$D_{d} \left( {d_{a} } \right) = \left\{ {\begin{array}{*{20}l} 1 \hfill & {if\, d_{a} = d} \hfill \\ {\max \left\{ {0.5*D_{d} \left( {d_{a}^{\prime } } \right)|d_{a}^{\prime } \in \,children \,of \,d_{a} } \right\}} \hfill & {if \,d_{a} \ne d} \hfill \\ \end{array} } \right.$$

Based on formula (1), the semantic value DV(d) of disease d was defined as:

$$DV\left( d \right) = \mathop \sum \limits_{{d_{a} \in N\left( d \right)}} D_{d} \left( {d_{a} } \right)$$

Finally, the semantic similarity between diseases A and B was constructed as follows:

$$DD1\left( {i,j} \right) = \frac{{\mathop \sum \nolimits_{{d_{t} \in N\left( {d_{i} } \right) \cap N\left( {d_{j} } \right)}} D_{{d_{i} }} \left( {d_{t} } \right) + D_{{d_{j} }} \left( {d_{t} } \right)}}{{DV\left( {d_{i} } \right) + DV\left( {d_{j} } \right)}}$$

Named the relationship matrix between diseases calculated by formula 3 as DD1.

Xuan et al. [15] proposed another calculation method for calculating the semantic similarity of diseases. This method expresses the contribution value of the disease's ancestor nodes to the disease as follows:

$$D_{d} \left( {d_{a} } \right) = - \log \left( {\frac{the\; number\, of\; N\left( d \right)}{{the\; number\; of\; disease}}} \right)$$

Substituting Formula (4) into Formula (2) and Formula (3), named the relationship matrix between diseases calculated as DD2.

Functional similarity calculation of miRNA

Based on the hypothesis that functionally similar miRNAs were likely to be associated with semantically similar diseases and vice versa, Wang et al. [17] calculated the functional similarity of miRNA through the disease semantic similarity and known miRNA–disease associations. The same method was used to calculate the functional similarity of miRNAs.

For any two miRNAs, the set of diseases associated with them was denoted as two vectors \(D^{{\left( {m_{i} } \right)}} = \left\{ {d_{1} ,d_{2} , \ldots ,d_{{m^{\prime } }} } \right\} = \left\{ {d_{{i^{\prime } }} } \right\}_{m} \subset D\) and \(D^{{\left( {m_{j} } \right)}} = \left\{ {d_{{1^{\prime \prime } }} ,d_{{2^{\prime \prime } }} , \ldots ,d_{{n^{\prime \prime } }} } \right\} = \left\{ {d_{{j^{\prime \prime } }} } \right\}_{n} \subset D\) The functional similarity of miRNA \({\text{m}}_{{\text{i}}}\) and miRNA \({\text{m}}_{{\text{j}}}\) was calculated as follows:

$$mm_{ij} = \frac{{\mathop \sum \nolimits_{{d_{t} \in D^{{\left( {m_{i} } \right)}} }} S\left( {d_{t} ,D^{{\left( {m_{j} } \right)}} } \right) + \mathop \sum \nolimits_{{d_{t} \in D^{{\left( {m_{j} } \right)}} }} S\left( {d_{t} ,D^{{\left( {m_{i} } \right)}} } \right)}}{m + n}$$

where m and n are denoted as the number of diseases associated with miRNA \({\text{m}}_{{\text{i}}}\) and miRNA \({\text{m}}_{{\text{j}}}\), respectively. \({\text{S}}\left( {{\text{d}}_{{{\text{i}}^{\prime } }} ,{\text{D}}^{{\left( {{\text{m}}_{{\text{j}}} } \right)}} } \right)\) represents the degree of association between a given disease \({\text{d}}_{{{\text{i}}^{\prime } }}\) and a given set of diseases \(D^{{\left( {m_{j} } \right)}}\). The calculation was as follows:

$$S\left( {d_{{i^{\prime } }} ,D^{{\left( {m_{j} } \right)}} } \right) = \mathop {\max }\limits_{{d_{t} \in D^{{\left( {m_{j} } \right)}} }} \left( {dd_{{i^{\prime } t}} } \right)$$

In addition, matrices \({\text{MM}}_{1}\) and \({\text{MM}}_{2}\) were used to denote the miRNA functional similarity matrices obtained by DD1 and DD2 calculations, respectively.

Gaussian interaction profifile kernel similarity calculation

Upon measuring the similarity among diseases through the disease semantic similarity, the semantic similarity among various diseases was set as 0 if the data between two diseases were missing. In reducing the impact of this factor on the prediction performance, Gaussian kernel function [71] was applied to the network of association relationships among topologies of bioinformatics nodes. The specific calculation is shown in Eq. (3).

$$GD\left( {i,j} \right) = exp\left( { - \gamma_{d} \parallel MD\left( {:,i} \right) - MD\left( {:,j} \right)\parallel^{2} } \right)$$

where \(MD\left( {:,i} \right)\) is the i-th column of the known miRNA–disease association matrix \(MD\). Parameter \(\gamma_{d}\) represents the control kernel bandwidth of Gaussian interaction spectrum kernel similarity. It is calculated using the following equation [71]:

$$\gamma_{d} = \frac{1}{{\frac{1}{{n_{d} }}\mathop \sum \nolimits_{i = 1}^{{n_{d} }} \left\| {MD\left( {:,i} \right)} \right\|^{2} }}$$

The similarity of the Gaussian interaction spectrum kernel among miRNAs can be calculated using the same method.

$$GM\left( {i,j} \right) = exp\left( { - \gamma_{l} \left\| {MD\left( {i,:} \right) - MD\left( {j,:} \right)} \right\| ^{2} } \right)$$

\(MD\left( {i,:} \right)\) is the i-th row of the matrix \(MD^{{n_{m} \times n_{d} }}\). Parameter \(\gamma_{1}\) can be obtained by the following equation [71]:

$$\gamma_{l} = \frac{1}{{\frac{1}{{n_{m} }}\mathop \sum \nolimits_{i = 1}^{{n_{m} }} \left\| {MD\left( {i,:} \right)} \right\|^{2} }}$$

Integrated similarity construction

As mentioned previously, the disease semantic similarity, miRNA functional similarity, and miRNA (disease) Gaussian interaction kernel spectral similarity were obtained. By integrating the complementary information from multiple data sources, an integrated similarity approach was used to quantify the similarity of each miRNA (disease) pair, addressing the sparsity of the original similarity matrix. The calculation was as follows:

$$ID\left( {i,j} \right) = \left\{ {\begin{array}{*{20}l} {\frac{{DD\left( {i,j} \right) + DD_{2} \left( {i,j} \right)}}{2}} \hfill & {d_{i} and d_{j}\, have \,semantic \,similarity} \hfill \\ {GD\left( {i,j} \right)} \hfill & {otherwise} \hfill \\ \end{array} } \right.$$
$$IM\left( {i,j} \right) = \left\{ {\begin{array}{*{20}l} {\frac{{MM_{1} \left( {i,j} \right) + MM_{2} \left( {i,j} \right)}}{2}} \hfill & {m_{i} \,and \,m_{j}\, have \,functional \,similarity} \hfill \\ {GM\left( {i,j} \right)} \hfill & {otherwise} \hfill \\ \end{array} } \right.$$

Association score estimation prediction

Based on the previously constructed integrated miRNA (disease) similarity, the Katz method was used to obtain the predicted scores estimation of miRNA–disease associations. The Katz method was successfully applied in social network relationship prediction, which calculated the similarity among nodes through the number of walk paths with different step lengths between two nodes. First, a heterogeneous network of miRNA–disease relationships was constructed by using the integrated miRNA–miRNA similarity network, the known miRNA–disease association network, and the integrated disease–disease similarity network. Then, the miRNA–disease associations were predicted on the heterogeneous network using the Katz method. The adjacency matrix of the heterogeneous network was expressed as follows:

$${\text{A}} = \left[ {\begin{array}{*{20}c} {IM} & {MD} \\ {MD^{T} } & {ID} \\ \end{array} } \right]$$

Then, the association between miRNAs and diseases was expressed by calculating the number of paths of different lengths among nodes:

$${\text{s}}^{{{\text{katz}}}} \left( A \right)_{ij} = \mathop \sum \limits_{l = 1}^{k} \beta^{l} \left( {A^{l} } \right)_{ij}$$

where \({\beta }\) is a non-negative constant used to control the influence of different path lengths, within a range of values \(\left( {0,{\text{min}}\left\{ {1,1/{\text{A}}_{2} } \right\}} \right)\). k indicates the final maximum path length obtained. When k tended to infinity, the above equation can be approximated as follows:

$${\text{s}}^{{{\text{katz}}}} = \mathop \sum \limits_{{{\text{l}} > 1}} {\upbeta }^{{\text{l}}} {\text{A}}^{{\text{l}}} = \left( {{\text{I}} - {\beta A}} \right)^{ - 1} - {\text{I}}$$

where I is the unit matrix. \({\text{s}}^{katz}\) corresponds to the upper right corner matrix of matrix A. \(MD_{e}\) is the prediction matrix of miRNA and disease.thus,it have the same structure as A(Shown in formula(13)). \({\text{MD}}_{{\text{e}}} { }\) is the prediction matrix of miRNA and disease which is the upper right submatrix of matrix \({\text{s}}^{{{\text{katz}}}}\) that quivalent to the relationship of MD with respect to A.

Association score refinement prediction

The accurate prediction scores for miRNA–disease associations calculated by the KATZNCP model consisted of two network-consistent projection scores. One was the spatial projection score of miRNAs and the other was the spatial projection score of diseases. The calculation process was described by calculating the association prediction score between miRNA \({\text{m}}_{{\text{i}}}\) and disease \({\text{d}}_{{\text{j}}}\).

Assuming that the spatial vector formed by the similarity scores of miRNA \({\text{m}}_{{\text{i}}}\) with other miRNAs (including miRNA \({\text{m}}_{{\text{i}}}\) itself) in the integrated miRNA–miRNA similarity network IM was represented as \(IM\left( {i,:} \right)\) (the ith row of matrix IM), the spatial vector formed by miRNAs associated with disease \({\text{d}}_{{\text{j}}}\) in the miRNA–disease predicted score matrix MD was represented as \(MD_{e} \left( {:,j} \right)\) (the jth column of matrix \(MD_{e}\)). In the miRNA space, the vector \({\text{IM}}\left( {{\text{i}},:} \right)\) represents the relationship between miRNA \({\text{m}}_{{\text{i}}}\) and all miRNAs, the vector \({\text{MD}}_{{\text{e}}} \left( {:,{\text{j}}} \right)\) represents the relationship between diseases \({\text{d}}_{{\text{j}}}\) and all miRNAs. Therefore, the similarity of the variation law could be characterized by the projection of \({\text{IM}}\left( {{\text{i}},:} \right)\) on vector \({\text{MD}}_{{\text{e}}} \left( {:,{\text{j}}} \right)\), which is called as space consistency projection score based on miRNAs. The calculation formula is as shown below:

$$MD_{pm} \left( {i,j} \right) = \frac{{IM\left( {i,:} \right) \times MD_{e} \left( {:,j} \right)}}{{MD_{e} \left( {:,j} \right)}}$$

where \(MD_{e} \left( {:,j} \right)\) is the two norms of \(MD_{e}\).

The consistency projection score based on the disease space can be obtained by using the same method.

$$MD_{pd} \left( {i,j} \right) = \frac{{ID\left( {j,:} \right) \times MD_{e}^{T} \left( {:,i} \right)}}{{MD_{e}^{T} \left( {:,i} \right)}}$$

where \(MD_{e}^{T} \left( {:,j} \right)\) is the two norms of \({ } MD_{e}^{T}\).

Finally, the miRNA space consistency projection score and disease space consistency projection score were integrated by using Eq. (13) to form the final prediction score.

$$MD^{*} = \frac{{MD_{pm} + MD_{pd}^{T} }}{2}.$$


Evaluation metrics

In order to systematically evaluate the performance of KATZNCP as well as other comparative methods, A leave-one-out cross-validation (LOOCV) was employed to test the predictive performance of the model. Specifically, one miRNA–disease association was selected as a test sample and all other miRNA–disease associations were regarded as training samples. Repeat these procedure until all miRNA–disease associations were used as a test sample once. The prediction effect was expressed by the receiver operating characteristic (ROC) curve, and the accuracy was quantified by the area under the ROC curve (AUC).ROC curve is a comprehensive indicator reflecting sensitivity (Sensitivity) and specificity (Specificity). The ROC curve reveals the relationship between sensitivity and specificity in a graphical way. By setting different thresholds, a series of corresponding sensitivities and specificities are calculated. Then draw a curve with the true positive rate (True positive rate, TPR, sensitivity or sensitivity) as the vertical axis and the false positive rate (False positive rate, FPR or 1-Specificity) as the horizontal axis. The calculation methods of TPR and FPR are as follows:

$${\text{TPR}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}}$$
$${\text{FPR}} = \frac{{{\text{FP}}}}{{{\text{FP}} + {\text{TN}}}}$$

which TP (True Positive) refer to the number of positive samples that are correctly predicted, that is, the number of positive samples that are predicted as positive samples; FP (False Positive) refer to the number of positive samples that are incorrectly predicted, that is,the number of negative samples predicted as positive samples; TN (True Negative) refer to the number of negative samples correctly predicted, that is, the number of negative samples predicted as negative samples; FN (False Negative) refer to The number of mispredicted negative samples, that is, the number of positive samples that were predicted as negative samples. Considering that we have no confirmed negative samples, we used an alternative.First obtain the upper and lower bounds of the threshold according to the prediction results.Then determine a set of thresholds accordingly. For any certain threshold, if the predicted value is greater than the threshold, the prediction will be considered as positive, otherwise the forecast will be considered as negative.

Effect of parameter selection

In equation \({\text{s}}^{{{\text{katz}}}} = \mathop \sum \limits_{{{\text{l}} > 1}} {\upbeta }^{{\text{l}}} {\text{A}}^{{\text{l}}} = \left( {{\text{I}} - {\beta A}} \right)^{ - 1} - {\text{I}}\), the value of parameter \(\beta\) was associated with the prediction effects. In ensuring the convergence of the series, the value of \(\beta\) shall be smaller than the inverse of the maximum eigenvalue of the adjacency matrix A. In obtaining the optimal parameter \(\beta\), \(\beta { }\) was set to \({ }\beta = \alpha \times 1/eigA\) (eigA was the maximum characteristic root of matrix A). Then, with steps of 0.1 and increment of \({\upalpha }\) from 0 to 0.9, 10 LOOCV were to calculate the AUC values. The experimental results obtained by implementing LOOCV are shown in Fig. 2a The results showed that when \({\upalpha }\) = 0, the equation was degenerated to \({\text{s}}^{{{\text{katz}}}} = 0\), indicating that KATZNCP had no prediction capability. When \({\upalpha }\) was increased from 0.1 to 0.9, AUC gradually decreased. AUC reached the maximum at 0.9316 when \({\upalpha }\) was 0.1, followed by 0.9299 when \({\upalpha } = 0.2\). Then, the steps were taken as 0.01 to obtain more accurate weighting parameters. \({\upalpha }\) was gradually increased from 0 to 0.2. Then, LOOCV was performed again. The obtained results are shown in Fig. 2b. The calculated AUC values fluctuated from 0.9299 to 0.9316. When \({\upalpha }\) ranged between 0.01 and 0.05, AUC fluctuated to approximately 0.9320. AUC reached the maximum at 0.9325 when \({\upalpha }\) was 0.02. When \({\upalpha }\) gradually increased from 0.05 to 0.2, the AUC value gradually decreased from 0.9316 to 0.9299. Therefore, 0.02 was finally selected as the value of \({\upalpha }\).

Fig. 2
figure 2

a the value of the AUC when \({\upalpha }\) was increased from 0 to 0.9. b the value of the AUC when \({\upalpha }\) was increased from 0 to 0.2

Comparison with state-of-the-art methods

Similar to the data resources used by KATZNCP, prediction models with excellent prediction results consisted of MDHGI [72], NSEMDA [73], RFMDA [74], and SNMFMDA [75]. These methods were selected for comparison with KATZNCP. Figure 3 shows the LOOCV results of each model, with AUC values of 0.8945, 0.8899, 0.8891, 0.9007, and 0.9325 for MDHGI, NSEMDA, RFMDA, SNMFMDA, and KATZNCP, respectively. KATZNCP showed the best prediction results, which was 4.25%, 4.79%, 4.88%, and 3.53% higher than MDHGI, NSEMDA, RFMDA, and SNMFMDA, respectively. Therefore, the prediction ability of KATZNCP was better than that of MDHGI and other models.

Fig. 3
figure 3

ROC curves of five competitive methods

Validation of new miRNAs and isolated disease prediction capabilities

New miRNAs refer to miRNAs with unknown association information with disease. With the continuous improvement of miRNA recognition techniques, an increasing number of miRNAs were being identified. Inspired by Liang et al. [76], here, another assessment metric was adopted to evaluate the predictive power of the model for new miRNAs, namely, leave one miRNA out cross validations (LOMOCV). In particular, one miRNA was selected as the test sample at one time. All diseases associated with this miRNA were removed before testing. Then, all candidate diseases were prioritized by using the information from other miRNA-associated diseases only, until all miRNAs had been validated as predicted samples.

Isolated diseases refer to diseases with unknown association information with miRNAs. Similar to the simulation of new miRNAs, all its associated miRNAs were removed for each isolated disease to simulate isolated diseases. All candidate miRNAs were prioritized by using the information from other disease-associated miRNAs, which is known as leave one disease out cross validations (LODOCV).

As shown in Fig. 4, the AUC of KATZNCP was 0.8256 under the LODOCV framework and 0.8351 under the LOMOCV framework.

Fig. 4
figure 4

Results of KATZNCP for newmiRNAs and isolated diseases

Case study

In demonstrating the predictive capability of our proposed model KATZNCP for disease-associated miRNA, two diseases, namely, lung neoplasms and esophageal neoplasms, were selected for case studies. All the prediction results were validated in the two independent databases, namely, HMDD v3.2 [77] and dbDEMC 2.0 [78].

Lung neoplasm is a kind of malignant tumor with rapid progression and poor prognosis. Distant metastasis often occurred, which then led to death. The detection rate of this disease in the early stage was not high, which posed a great threat to people’s health [79]. The prediction of miRNA associated with lung neoplasms was of great practical significance. For lung neoplasms, the top 50 miRNAs related to lung neoplasms predicted by KATZNCP have been supported in two data sets, namely, HMDD v3.2 and dbDEMC (Table 1).

Table 1 The top 50 lung neoplasm-related miRNAs

Esophageal neoplasm is the eighth most common cancer worldwide. The effectiveness of treatment for esophageal cancer was largely dependent on its cause [80]. For esophageal neoplasms, among the predicted top 50 miRNAs, 47 miRNAs have been supported in two data sets, namely, HMDD v3.2 and dbDEMC (Table 2). Only the supporting evidence of hsa-mir-200b, hsa-mir-302b, and hsa-mir-302c cannot be found. However, evidence of the association between hsa-mir-200b and esophageal neoplasms was found after searching other literature manually. For example, S. Kirkilevsky [81] found that the expression of miRNA-200b and ERCC1 in EC cells can be used to predict the aggressiveness of esophageal cancer, which was published in 2020. Yang et al. [18] predicted the relationship between hsa-mir-302b and esophageal neoplasms through computational method. The predictive power of KATZNCP was further confirmed by the aforementioned evidence. Although no current medical trials have shown that the two miRNAs, hsa-mir-302b and hsa-mir-302c, were related to esophageal neoplasms, biologists will conduct further experiments to uncover their potential relationship.

Table 2 The top 50 Esophageal Neoplasms-related miRNAs

In testing the predictive performance of KATZNCP for isolated diseases, isolated diseases were simulated by the same approach as that of LODCV. Alternatively, all miRNAs associated with the disease to be verified were deleted before KATZNCP was implemented. For lung neoplasm, 132 known associations between lung neoplasm and miRNAs were deleted. KATZNCP was used to predict the potential associations between miRNAs and lung neoplasm. All of the top 50 predicted miRNAs can be supported in HDMM3.2 and dbDEMC databases (Table 3). For esophageal neoplasms, 74 known associations were deleted, and KATZNCP was used for prediction. Of the top 50 predicted associations, 49 were supported in the databases HDMM3.2 and dbDEMC (Table 4). Only hsa-mir-200b was not demonstrated by either database. However, based on previous case analysis of common disease prediction, available studies showed a close relationship between hsa-mir-200b and esophageal neoplasms.

Table 3 The top 50 lung neoplasms-related miRNAs candidates predicted by KATZNCP with removed all known lung neoplasms-miRNAs associations and the confirmation of these associations
Table 4 The top 50 esophageal neoplasms-related miRNAs candidates predicted by KATZNCP with removed all known esophageal neoplasms-miRNAs associations and the confirmation of these associations

Discussion and conclusion

Considerable studies have shown that miRNAs play an important role in a wide range of biological processes. miRNAs are associated with the occurrence and development of many complex diseases. Many miRNAs are considered as the ideal biomarkers for disease prevention, diagnosis, and treatment. Given the time consumption and intensive labor to verify the association between miRNA and disease through traditional biological experiments, the prediction of the potential association between miRNA and disease through computational methods as an effective supplement to biological experiments has become a hot topic in bioinformatics.

In this paper, a new prediction model KATZNCP was proposed, which consisted of three stages: constructing accurate similarity network, obtaining miRNA–disease prediction score by KATZ algorithm, and obtaining two miRNA–disease refinement score by network consistency projection. Reasonable construction of the similarity relationship between disease and miRNA can improve the prediction accuracy of the computational method. In constructing a reasonable similarity relationship, Gaussian kernel function was applied to the topological association relationship network among biological information nodes. The similarity of Gaussian kernel spectrum between diseases and miRNAs was calculated by experimentally verifying disease–miRNA association information. Then, an accurate disease similarity network was constructed by integrating the experimentally verified disease-miRNA association information, semantic similarity network among diseases, and Gaussian interaction profifile kernel similarity information among diseases. An accurate miRNA similarity network was constructed by integrating the experimentally verified disease–miRNA association information, the functional similarity network among miRNAs, and the Gauss kernel similarity among miRNAs. Afterward, the integrated disease similarity network, the integrated miRNA similarity network, and the known miRNA–disease association were used to construct a heterogeneous network. The KATZ algorithm was applied on the heterogeneous network to obtain the initial association score between miRNA and diseases. The calculated association scoring network of the initial score was projected into the integrated disease similarity network and integrated miRNA similarity network to obtain the consistency information among vectors. Then, the consistency projection scoring matrix based on the disease space and miRNA space was obtained. Finally, the two consensus prediction scores were weighted as the final miRNA–disease association prediction score. The prediction model algorithm was simple in design and low in time complexity, and it can be applied to the prediction of isolated diseases and new miRNAs. Given the local information obtained in heterogeneous networks through KATZ and the global information among the experimentally verified disease–miRNA association network, the integrated miRNA similarity network, and the integrated disease similarity network obtained through the consistency projection, the prediction results were ensured to be unbiased to the miRNA with more known associations (Additional file 1, Additional file 2, Additional file 3).

In the case study, lung neoplasms and esophageal neoplasms were selected for experimental study. Among the top 50 miRNA prediction related to corresponding diseases, the validation accuracy in HDMM3.2 and dbDEMC databases was 100% and 94%, respectively. For the prediction of isolated disease cases, 100% and 98% of the top 50 miRNAs were confirmed by the two above mentioned databases. For some miRNAs without experimental verification, relevant correlation evidence was found in recent literature. The reliable prediction of KATZNCP provided insight into the identification of potential miRNA biomarkers and contributed to the future work on the involvement of miRNA in human disease mechanisms.