Log in

NRWalk2Vec-HIN: spammer group detection based on heterogeneous information network embedding over social media

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Online reviews have a significant influence on consumers’ purchasing decisions. Unfortunately, many sellers exploit these reviews by employing a large number of spammers who strategically craft fake reviews to enhance their own reputation or tarnish their competitors’. Although several models have been proposed to address this issue in recent years, they often overlook the importance of considering a combination of structural and behavioural features, focusing solely on either structural or textual aspects. To overcome these limitations, we propose a novel model for detecting organized spammer groups called the Fake Reviewers Groups Detection Model. This model comprehensively considers the network structure and reviewer behavioural features to identify such groups. By extracting user, product, review time, and rating information, we construct a heterogeneous information network using a Meta-graph, which explores user relationships. Then apply the Node Ranking Walk2Vec algorithm to generate random walks within this heterogeneous information network, enabling us to obtain low-dimensional vector representations of the user nodes. Subsequently, utilize the Gaussian fuzzy Cluster Means algorithm for clustering, thereby identifying candidate groups of reviewers. The dynamic weight of each detection indicator is determined using the entropy approach, allowing us to assign an appropriate rank to reviewer groups based on their level of suspiciousness and identify them as fake reviewer groups.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Availability of data and materials

All data generated or analysed during this study are included in the published articles [53], and [54].

References

  1. Barry E (2022) Yelp statistics 2022 demographics, users and facts. https://www.enterpriseappstoday.com/stats/yelp-statistics.html

  2. Qiu J, Li Y, Lin Z (2020) Detecting social commerce: an empirical analysis on yelp. J Electron Commerce Res 21(3):168–179

    Google Scholar 

  3. Heydari A, ali Tavakoli M, Salim N, Heydari Z (2015) Detection of review spam: a survey. Expert Syst Appl 42(7):3634–3642

  4. Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st International Conference on World Wide Web, pp 191–200

  5. **dal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp 219–230

  6. Lim E-P, Nguyen V-A, **dal N, Liu B, Lauw HW (2010) Detecting product review spammers using rating behaviors. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp 939–948

  7. Mukherjee A, Venkataraman V, Liu B, Glance N (2013) What yelp fake review filter might be doing? In: Proceedings of the International AAAI Conference on Web and Social Media, pp 409–418

  8. Fei G, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R (2013) Exploiting burstiness in reviews for review spammer detection. In: Proceedings of the International AAAI Conference on Web and Social Media, pp 175–184

  9. Liu Y, Pang B (2018) A unified framework for detecting author spamicity by modeling review deviation. Expert Syst Appl 112:148–155

    Article  Google Scholar 

  10. Chen H, Liu J, Lv Y, Li MH, Liu M, Zheng Q (2018) Semi-supervised clue fusion for spammer detection in sina weibo. Inf Fusion, 44:22–32

  11. Barbado R, Araque O, Iglesias CA (2019) A framework for fake review detection in online consumer electronics retailers. Inf Process Manag 56(4):1234–1244

  12. El-Mawass N, Honeine P, Vercouter L (2020) Similcatch: enhanced social spammers detection on twitter using Markov random fields. Inf Process Manag 57(6):102317

    Article  Google Scholar 

  13. Koggalahewa D, Yue X, Foo E (2022) An unsupervised method for social network spammer detection based on user information interests. J Big Data 9(1):1–35

    Article  Google Scholar 

  14. Dou Y, Ma G, Yu PS, ** Yu, Alazab M, Shalaginov A (2021) Deep graph neural network-based spammer detection under the perspective of heterogeneous cyberspace. Future Gener Comput Syst 117:205–218

    Article  Google Scholar 

  15. Wang Z, Wei W, Mao X-L, Guo G, Zhou P, Jiang S (2022) User-based network embedding for opinion spammer detection. Pattern Recognit 125:108512

    Article  Google Scholar 

  16. Xu C, Zhang J, Chang K, Long C (2013) Uncovering collusive spammers in Chinese review websites. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pp 979–988

  17. Xu C, Zhang J (2015) Towards collusive fraud detection in online reviews. In: 2015 IEEE International Conference on Data Mining, pp 1051–1056. IEEE

  18. Dematis I, Karapistoli E, Vakali A (2018) Fake review detection via exploitation of spam indicators and reviewer behavior characteristics. In: International Conference on Current Trends in Theory and Practice of Informatics, pp 581–595. Springer

  19. Dou T, Yu J, **ong Q, Gao M, Song Y, Fang Q (2017) Collaborative shilling detection bridging factorization and user embedding. In: International Conference on Collaborative Computing: Networking, Applications and Worksharing, pp 459–469. Springer

  20. Ye J, Akoglu L (2015) Discovering opinion spammer groups by network footprints. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 267–282. Springer

  21. Wang Z, Hou T, Song D, Li Z, Kong T (2016) Detecting review spammer groups via bipartite graph projection. Comput J 59(6):861–874

    Article  Google Scholar 

  22. Wang Z, Songmin G, Zhao X, **aowei X (2018) Graph-based review spammer group detection. Knowl Inf Syst 55(3):571–597

    Article  Google Scholar 

  23. Li H, Fei G, Wang S, Liu B, Shao W, Mukherjee A, Shao J (2017) Bimodal distribution and co-bursting in review spam detection. In: Proceedings of the 26th International Conference on World Wide Web, pp 1063–1072

  24. Ji S-J, Zhang Q, Li J, Chiu Dickson KW, Xu S, Yi L, Gong M (2020) A burst-based unsupervised method for detecting review spammer groups. Inf Sci 536:454–469

  25. Zhang F, Hao X, Chao J, Yuan S (2020) Label propagation-based approach for detecting review spammer groups on e-commerce websites. Knowl-Based Syst 193:105520

    Article  Google Scholar 

  26. Wang S, Zhang P, Wang H, Hongtao Yu, Zhang F (2022) Detecting shilling groups in online recommender systems based on graph convolutional network. Inf Process Manag 59(5):103031

    Article  Google Scholar 

  27. Zhang F, Yuan S, Zhang P, Chao J, Yu H (2022a) Detecting review spammer groups based on generative adversarial networks. Inf Sci

  28. Chao J, Zhao C, Zhang F (2022) Network embedding-based approach for detecting collusive spamming groups on e-commerce platforms. Secur Commun Netw

  29. He D, Pan M, Hong K, Cheng Y, Chan S, Liu X, Guizani N (2020) Fake review detection based on pu learning and behavior density. IEEE Netw 34(4):298–303

    Article  Google Scholar 

  30. **g-Yu C, Ya-Jun W (2022) Semi-supervised fake reviews detection based on aspamgan. J Artif Intell 4(1):17–36

    Google Scholar 

  31. Filho MC, Rafael DN, Barros Lucia SGME (2023) Mind the fake reviews! protecting consumers from deception through persuasion knowledge acquisition. J Bus Res 156:113538

  32. Liu Y, Wang L, Shi T, Li J (2022) Detection of spam reviews through a hierarchical attention architecture with N-gram CNN and Bi-LSTM. Inf Syst 103:101865

    Article  Google Scholar 

  33. Rupesh Kumar D, Anil Kumar S (2018) State-of-art approaches for review spammer detection: a survey. J Intell Inf Syst 50(2):231–264

  34. Zhang Y, Tan Y, Zhang M, Liu Y, Chua T-S, Ma S (2015) Catch the black sheep: unified framework for shilling attack detection based on fraudulent action propagation. In: Twenty-Fourth International Joint Conference on Artificial Intelligence

  35. Do Quynh NT, Hussain FK, Nguyen BT (2017) A fuzzy approach to detect spammer groups. In: 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp 1–6. IEEE

  36. Wang X, Liu K, Zhao J (2017) Handling cold-start problem in review spam detection by jointly embedding texts and behaviors. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (vol 1: Long Papers), pp 366–376

  37. Zheng Z, Mingyang Z, Jun W, Kezhong L, Guoliang C, Liao H (2022b) Spammer detection via ranking aggregation of group behavior. Expert Syst Appl, pp 119454

  38. Akoglu L, Chandy R, Faloutsos C (2013) Opinion fraud detection in online reviews by network effects. In: Proceedings of the International AAAI Conference on Web and Social Media, pp 2–11

  39. Zheng M, Zhou C, Wu J, Pan S, Shi J, Guo L (2018) Fraudne: a joint embedding approach for fraud detection. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp 1–8. IEEE

  40. Zhu C, Zhao W, Li Q, Li P, Da Q (2019) Network embedding-based anomalous density searching for multi-group collaborative fraudsters detection in socialmedia. Computers, Materials and Continua

  41. Cao C, Li S, Yu S, Chen Z (2021) Fake reviewer group detection in online review systems. In: 2021 International Conference on Data Mining Workshops (ICDMW), pp 935–942. IEEE

  42. Cavallari S, Zheng VW, Cai H, Chang KC-C, Cambria E (2017) Learning community embedding with community detection and node embedding on graphs. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp 377–386

  43. ** W, Derr T, Wang Y, Ma Y, Liu Z, Tang J (2021) Node similarity preserving graph convolutional networks. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp 148–156

  44. Park N, Kan A, Dong XL, Zhao T, Faloutsos C (2019) Estimating node importance in knowledge graphs using graph neural networks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 596–606

  45. Rhouma D, Romdhane LB (2014) An efficient algorithm for community mining with overlap in social networks. Expert Syst Appl 41(9):4309–4321

  46. Gao Y, Wang Z, **e J, Pan J (2022) A new robust fuzzy c-means clustering method based on adaptive elastic distance. Knowl-Based Syst 237:107769

    Article  Google Scholar 

  47. Rasmussen C (1999) The infinite gaussian mixture model. In: Advances in Neural Information Processing Systems, 12

  48. Chaudhary L, Singh B (2020) Community detection using maximizing modularity and similarity measures in social networks. In: Smart Systems and IoT: Innovations in Computing: Proceeding of SSIC 2019, pp 197–206. Springer

  49. Askari S (2021) Fuzzy c-means clustering algorithm for data with unequal cluster sizes and contaminated with noise and outliers: review and development. Expert Syst Appl 165:113856

    Article  Google Scholar 

  50. Ni J, Muhlstein L, McAuley J (2019) Modeling heart rate and activity data for personalized fitness recommendation. In: The World Wide Web Conference, pp 1343–1353

  51. Rayana S, Akoglu L (2015) Collective opinion spam detection: bridging review networks and metadata. In: Proceedings of the 21th ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, pp 985–994

  52. Zhang L, He G, Cao J, Zhu H, Bingfeng X (2018) Spotting review spammer groups: a cosine pattern and network based method. Concurrency Comput: Pract Exp 30(20):e4686

    Article  Google Scholar 

Download references

Funding

This research received no specific grant from any financing agency in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed equally to this work. In addition, all authors have read and approved the final manuscript and given their consent for publication of the article.

Corresponding author

Correspondence to Arvind Mewada.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethical approval

Ethical approval was not required for this article because it does not contain any studies with human participants. No animal studies were involved in this review.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mewada, A., Dewang, R.K. NRWalk2Vec-HIN: spammer group detection based on heterogeneous information network embedding over social media. J Supercomput 80, 1818–1851 (2024). https://doi.org/10.1007/s11227-023-05537-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05537-0

Keywords

Navigation