Abstract
Discovering new relationships between entities in networked data is essential in various applications such as sociology, security, physics, and biology. This paper introduces a novel approach to directed link prediction, filling a notable research gap by acknowledging the importance of the directionality of relationships often overlooked in traditional methods. We present three algorithms: an asymmetric similarity-popularity algorithm, which applies the similarity-popularity paradigm specifically to directed networks; a path exploration algorithm, which utilizes path patterns, closure probabilities, and paths’ exploratory potential to predict new links formation; and a hybrid algorithm that merges the strengths of both approaches. The effectiveness of these methods is rigorously evaluated on real-life networks, demonstrating their robust performance across various types and sizes of networked data. In addition to predictive power and runtime performance assessments, we study the impact of predicted links on network spreading capacity. This perspective provides invaluable insights into the broader implications of our algorithms on network behavior and dynamics.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-024-05565-0/MediaObjects/10489_2024_5565_Figa_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-024-05565-0/MediaObjects/10489_2024_5565_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-024-05565-0/MediaObjects/10489_2024_5565_Figb_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-024-05565-0/MediaObjects/10489_2024_5565_Figc_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-024-05565-0/MediaObjects/10489_2024_5565_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-024-05565-0/MediaObjects/10489_2024_5565_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-024-05565-0/MediaObjects/10489_2024_5565_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-024-05565-0/MediaObjects/10489_2024_5565_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-024-05565-0/MediaObjects/10489_2024_5565_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-024-05565-0/MediaObjects/10489_2024_5565_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-024-05565-0/MediaObjects/10489_2024_5565_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-024-05565-0/MediaObjects/10489_2024_5565_Fig9_HTML.png)
Similar content being viewed by others
Availability of data and materials
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
References
Adamic LA, Glance N (2005) The political blogosphere and the 2004 us election: divided they blog. In: Proceedings of the 3rd international workshop on Link discovery. ACM, pp 36–43
Al Hasan M, Chaoji V, Salem S, et al (2006) Link prediction using supervised learning. In: SDM06: workshop on link analysis, counter-terrorism and security.pp 798–805
Alharbi R, Hafida B, Kerrache S (2016) Scalable link prediction in complex networks using a type of geodesic distance. In: Asia multi conference on mathematical modelling and computer simulation. pp 15.1–15.6
Aziz F, Gul H, Uddin I et al (2020) Path-based extensions of local link prediction methods for complex networks. Sci Rep 10(1):1–11
Batagelj V, Mrvar A (2006) Pajek datasets. http://vlado.fmf.uni-lj.si/pub/networks/data
Berahmand K, Nasiri E, Forouzandeh S et al (2022) A preference random walk algorithm for link prediction through mutual influence nodes in complex networks. J King Saud Univ - Comput Inf Sci 34(8):5375–5387
Boguñá M, Krioukov D, Claffy KC (2009) Navigability of complex networks. Nat Phys 5(1):74–80. https://doi.org/10.1038/nphys1130
Boguñá M, Bonamassa I, De Domenico M et al (2021) Network geometry. Nature Reviews. Physics 3(2):114–135. https://doi.org/10.1038/s42254-020-00264-4
Boldi P, Vigna S (2004) The Webgraph Framework I: Compression Techniques. In: Proceedings of the 13th International Conference on World Wide Web. Association for Computing Machinery, New York, NY, USA, WWW ’04, p 595–602. https://doi.org/10.1145/988672.988752
Boldi P, Rosa M, Santini M, et al (2011) Layered Label Propagation: A Multiresolution Coordinate-Free Ordering for Compressing Social Networks. In: Proceedings of the 20th International Conference on World Wide Web. Association for Computing Machinery, New York, NY, USA, WWW ’11, p 587-596. https://doi.org/10.1145/1963405.1963488
Chen J, Wang X, Xu X (2022) Gc-lstm: graph convolution embedded lstm for dynamic network link prediction. Appl Intell 52(7):7513–7528
Clauset A, Moore C, Newman ME (2008) Hierarchical structure and the prediction of missing links in networks. Nature 453(7191):98–101. https://doi.org/10.1038/nature06830
Coleman J, Katz E, Menzel H (1957) The diffusion of an innovation among physicians. Sociometry 20(4):253–270
Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning. Association for Computing Machinery, New York, NY, USA, ICML ’06, p 233–240. https://doi.org/10.1145/1143844.1143874
De Nooy W, Mrvar A, Batagelj V (2011) Exploratory social network analysis with Pajek (2nd ed. Structural Analysis in the Social Sciences). Cambridge University Press. https://doi.org/10.1017/CBO9780511996368
Doppa JR, Yu J, Tadepalli P, et al (2010) Learning algorithms for link prediction based on chance constraints. In: Joint european conference on machine learning and knowledge discovery in databases, Springer. Springer Berlin Heidelberg, pp 344–360. https://doi.org/10.1007/978-3-642-15880-3_28
Eash RW, Chon KS, Lee YJ et al (1983) Equilibrium traffic assignment on an aggregated highway network for sketch planning. Transp Res Rec 944:30–37
Garcia Gasulla D, Cortés García CU (2014) Link prediction in very large directed graphs: Exploiting hierarchical properties in parallel. In: Proceedings of the 3rd Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data co-located with 11th Extended Semantic Web Conference (ESWC 2014). CEUR-WS, pp 1–13
Garcia-Gasulla D, Ayguadé E, Labarta J, et al (2016) Limitations and alternatives for the evaluation of large-scale link prediction. https://doi.org/10.48550/ARXIV.1611.00547
Gleich D, Zhukov L, Berkhin P (2004) Fast parallel pagerank: A linear system approach. Yahoo! Research Technical Report YRL-2004-038 13:22
Goldenberg A, Zheng AX, Fienberg SE et al (2010) A survey of statistical network models. Found Trends Mach Learn 2(2):129–233. https://doi.org/10.1561/2200000005
Gou F, Wu J (2022) Triad link prediction method based on the evolutionary analysis with iot in opportunistic social networks. Comput Commun 181:143–155
Guimerà R, Danon L, Díaz-Guilera A et al (2003) Self-similar community structure in a network of human interactions. Phys Rev E 68(065):103
Guimerà R, Mossa S, Turtschi A et al (2005) The worldwide air transportation network: Anomalous centrality, community structure, and cities’ global roles. Proc Natl Acad Sci U S A 102(22):7794–9. https://doi.org/10.1073/pnas.0407994102
Guimerà R, Sales-Pardo M (2009) Missing and spurious interactions and the reconstruction of complex networks. Proc Natl Acad Sci 106(52):22073–22078. https://doi.org/10.1073/pnas.0908366106
Guo G, Zhang J, Thalmann D, et al (2014) ETAF: An extended trust antecedents framework for trust prediction. In: 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014). IEEE, pp 540–547. https://doi.org/10.1109/ASONAM.2014.6921639
Guo G, Zhang J, Yorke-Smith N (2016) A novel evidence-based bayesian similarity measure for recommender systems. ACM Trans Web 10(2). https://doi.org/10.1145/2856037
Hagy JD (2002) Eutrophication, hypoxia and trophic transfer efficiency in chesapeake bay. PhD thesis, University of Maryland at College Park (USA)
Huang Z (2010) Link prediction based on graph topology: The predictive value of generalized clustering coefficient. SSRN http://dx.doi.org/10.2139/ssrn.1634014
Hummon N, Doreian P, Freeman L (1990) Analyzing the structure of the centrality-productivity literature created between 1948 and 1979. Knowledge 11:459–480
Kerrache S, Alharbi R, Benhidour H (2020) A scalable similarity-popularity link prediction method. Sci Rep 10(6394):1–14. https://doi.org/10.1038/s41598-020-62636-1
Kumar S, Spezzano F, Subrahmanian VS, et al (2016) Edge weight prediction in weighted signed networks. In: 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, pp 221–230. https://doi.org/10.1109/ICDM.2016.0033
Kunegis J (2013) Konect: The koblenz network collection. In: Proceedings of the 22nd International Conference on World Wide Web. Association for Computing Machinery, New York, NY, USA, WWW ’13 Companion, p 1343-1350, https://doi.org/10.1145/2487788.2488173
Leskovec J, Krevl A (2014) SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data
Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, New York, NY, USA, KDD ’05, pp 177–187
Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: Densification and shrinking diameters. ACM Trans Knowl Discov Data 1(1):1–40
Leskovec J, Lang KJ, Dasgupta A et al (2009) Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6(1):29–123
Leskovec J, Huttenlocher D, Kleinberg J (2010) Signed networks in social media. In: Proceedings of the SIGCHI Conference on human factors in computing systems. ACM, pp 1361–1370
Ley M (2002) The dblp computer science bibliography: Evolution, research issues, perspectives. Lecture notes in computer science. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp 1–10
Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J. Assoc Inf Sci Technol 58(7):1019–1031
Liu W, Lü L (2010) Link prediction based on local random walk. EPL (Europhysics Letters) 89(5):58,007. https://doi.org/10.1209/0295-5075/89/58007
Liu Z, He JL, Kapoor K et al (2013) Correlations between community structure and link formation in complex networks. PloS one 8(9):e72908. https://doi.org/10.1371/journal.pone.0072908
Lü L, ** CH, Zhou T (2009) Similarity index based on local paths for link prediction of complex networks. Phys Rev E 80(4):046122. https://doi.org/10.1103/PhysRevE.80.046122
Lü L, Zhou T (2011) Link prediction in complex networks: A survey. Physica A Stat Mech Appl 390(6):1150–1170. https://doi.org/10.1016/j.physa.2010.11.027
Mainas E (2009) The analysis of criminal and terrorist organisations as social network structures. Master’s thesis, Institute of Criminal Justice Studies, University of Portsmouth, UK
Martínez V, Berzal F, Cubero JC (2016) A survey of link prediction in complex networks. ACM Comput Surv 49(4):1–33
Massa P, Salvetti M, Tomasoni D (2009) Bowling alone and trust decline in social network sites. In: 2009 Eighth IEEE international conference on dependable, autonomic and secure computing. pp 658–663. https://doi.org/10.1109/DASC.2009.130
Moody J (2001) Peer influence groups: identifying dense clusters in large networks. Soc Netw 23(4):261–283
Muscoloni A, Cannistraci CV (2017) Local-ring network automata and the impact of hyperbolic geometry in complex network link-prediction. https://doi.org/10.48550/ARXIV.1707.09496
Newman MEJ (2003) Mixing patterns in networks. Phys Rev E 67(026):126 https://doi.org/10.1103/PhysRevE.67.026126, link.aps.org/doi/10.1103/PhysRevE.67.026126
Ortiz E, Starnini M, Serrano MÁ (2017) Navigability of temporal networks in hyperbolic space. Sci Rep 7(1):15054. https://doi.org/10.1038/s41598-017-15041-0
Papadopoulos F, Kitsak M, Serrano MÁ et al (2012) Popularity versus similarity in growing networks. Nature 489(7417):537–540. https://doi.org/10.1038/nature11459
Paranjape A, Benson AR, Leskovec J (2017) Motifs in temporal networks. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, New York, NY, USA, WSDM ’17, pp 601–610
Pastor-Satorras R, Castellano C, Van Mieghem P et al (2015) Epidemic processes in complex networks. Rev Mod Phys 87:925–979 https://doi.org/10.1103/RevModPhys.87.925, link.aps.org/doi/10.1103/RevModPhys.87.925
Schall D (2014) Link prediction in directed social networks. Soc Netw Anal Min 4(1):157. https://doi.org/10.1007/s13278-014-0157-9
Serrano MA, Krioukov D, Boguñá M (2008) Self-similarity of complex networks and hidden metric spaces. Phys Rev Lett 100(078):701. https://doi.org/10.1103/PhysRevLett.100.078701
Stelzl U, Worm U, Lalowski M et al (2005) A human protein-protein interaction network: A resource for annotating the proteome. Cell 122(6):957–968
Sun J, Kunegis J, Staab S (2016) Predicting user roles in social networks using transfer learning with feature transformation. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pp 128–135, https://doi.org/10.1109/ICDMW.2016.0026
Thomas M, Pang B, Lee L (2006) Get out the vote: Determining support or opposition from congressional floor-debate transcripts. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, USA, EMNLP ’06, p 327-335
Ubaldi E, Burioni R, Loreto V et al (2021) Emergence and evolution of social networks through exploration of the adjacent possible space. Commun Phys 4(1):1–12
Vanunu O, Sharan R (2008) A propagation-based algorithm for inferring gene-disease assocations. In: Beyer A, Schroeder M (eds) German Conference on Bioinformatics. Gesellschaft für Informatik e. V., Bonn, pp 54–63
Vega-Oliveros DA, Zhao L, Berton L (2019) Evaluating link prediction by diffusion processes in dynamic networks. Sci Rep 9(1):10833–14
Vega-Oliveros DA, Zhao L, Rocha A et al (2021) Link prediction based on stochastic information diffusion. IEEE Trans Neural Netw Learn Syst 33(8):3522–3532
Šubelj L, Bajec M (2013) Model of complex networks based on citation dynamics. In: Proceedings of the 22nd International Conference on World Wide Web. Association for Computing Machinery, New York, NY, USA, WWW ’13 Companion, pp 527–530. https://doi.org/10.1145/2487788.2487987
Wang W, Cai F, Jiao P et al (2016) A perturbation-based framework for link prediction via non-negative matrix factorization. Sci Rep 6(38938):1–11. https://doi.org/10.1038/srep38938
Watts DJ, Strogatz SH (1998) Collective dynamics of ’small-world’networks. Nature 393(6684):440
Wu Z, Di Z, Fan Y (2020) An asymmetric popularity-similarity optimization method for embedding directed networks into hyperbolic space. Complexity (New York, NY) 2020:1–16
Yang Y, Lichtenwalter RN, Chawla NV (2015) Evaluating link prediction methods. Knowl Inf Syst 45(3):751–782. https://doi.org/10.1007/s10115-014-0789-0
Zhang ZK, Liu C, Zhan XX, et al (2016) Dynamics of information diffusion and its applications on complex networks. Physics Reports 651:1–34. https://doi.org/10.1016/j.physrep.2016.07.002, https://www.sciencedirect.com/science/article/pii/S0370157316301600, dynamics of information diffusion and its applications on complex networks
Zhou T, Lü L, Zhang YC (2009) Predicting missing links via local information. Eur Phys J B 71(4):623–630. https://doi.org/10.1140/epjb/e2009-00335-8
Zou J, Fekri F (2014) Exploiting popularity and similarity for link recommendation in twitter networks. In: Proceedings of the 6th Workshop on Recommender Systems and the Social Web (RSWeb 2014) co-located with the 8th ACM Conference on Recommender Systems (RecSys 2014). CEUR-WS
Acknowledgements
This research project was supported by a grant from the ”Research Center of the Female Scientific and Medical Colleges”, Deanship of Scientific Research, King Saud University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no financial interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
A Data description
B Proof of Theorem 1
Proof
The proof involves analyzing the hierarchical sub-graph generated from a randomly chosen node i, numbering the levels starting from 0. Let \(v_l\) be the number of nodes in level l, \(e_l\) the number of edges starting at level l, and \(e_{l}^{l'}\) the edges linking levels l and \(l'\). We have, by definition:
At level 0, \(\mathbb {E}(v_0)\) = 1, \(\mathbb {E}(e_0) = \bar{\kappa }\), and due to degree independence, \(\mathbb {E}(v_1)= \bar{\kappa }\), and
Among the \(e_1\) edges, \(e_0^1\) edges point to level 0, that is, \(\mathbb {E}(e_{1}^{0}) = \bar{\kappa }\). Furthermore, by definition of the average clustering coefficient, there is a \(\bar{C}\) probability that any two nodes in level 1 connect. Hence,
Using (20), this implies that \(\mathbb {E}(e_{1}^{2})=\bar{\kappa }(1-c)(\bar{\kappa }-1)\). Since some of these edges will point to the same nodes, \(v_2\) is then the expected number of distinct nodes obtained by selecting \(e_{1}^{2}\) nodes with repetition from the remaining \(n-n_1\) nodes:
Since, we can at most choose \(e_{1}^{2}\) distinct nodes,
Note that this upper limit can be reached when n is very large so that the approximation
is valid. Using this approximation in (25) results in:
From (26):
Using a similar argument for the next stage, we can show that:
The number of edges internal to level 2 is minimized when each node in level 2 has a single parent in level 1. Hence:
It follows then:
More generally, for all \(l \ge 1\):
The expected number of nodes in the sub-graph can then be bounded by:
The expected number of edges in this sub-graph is bounded by:
\(\square \)
C Detailed results
This section presents the detailed results of the performance evaluation experiments for each network (See Tables 6–10). For each performance measure, results attaining the best significant rank at a significance level of \(p=0.05\) are highlighted in bold.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Benhidour, H., Almeshkhas, L. & Kerrache, S. Link prediction in directed complex networks: combining similarity-popularity and path patterns mining. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05565-0
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-05565-0