Performance Evaluation of Clustering-Based Classification Algorithms for Detection of Online Spam Reviews

  • Conference paper
  • First Online:
Data Intelligence and Cognitive Informatics

Part of the book series: Algorithms for Intelligent Systems ((AIS))

Abstract

The social Web contains enormous content, in which a higher range of online users believes the online reviews for their decision making before any online purchase or using services. The online reviews written are all not true, and some of them are Spam. Data mining classification helps in finding the review as spam or ham. Many text classification algorithms are existing, and it has been proved that these classifiers can be improved when clustering is used with classification to form features. This research work focuses on finding out the right classifier and improving it through clustering. Also, uncover the suitable clustering technique for improving the performance of the classifier. Three classifier algorithms such as Naive Bayes, support vector machine (SVM) and Decision Tree classifiers; clustering algorithms such as K-means, One-Pass and DBScan (Density-based spatial clustering of applications with noise) clustering algorithms are compared and found that the K-means clustering, when used with SVM classifier, outperforms than other combinations. Amazon and Yelp datasets are used for implementation, and the accuracy of SVM classifier has been improved from 89.02% to 90.02% for Amazon dataset and from 86.03% to 88.25% for the Yelp dataset when used with K-means clustering, which is significant than the other combinations compared.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (France)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 160.49
Price includes VAT (France)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 210.99
Price includes VAT (France)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
EUR 210.99
Price includes VAT (France)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Lin HC, Bruning PF, Swarna H (2018) Using online opinion leaders to promote the hedonic and utilitarian value of products and services. Bus Horiz 61(3):431–442

    Article  Google Scholar 

  2. Bajaj S, Garg N, Singh SK (2017) A novel user-based spam review detection. Procedia Comput Sci 122:1009–1015

    Article  Google Scholar 

  3. Dewang RK, Singh AK (2018) State-of-art approaches for review spammer detection: a survey. J Intell Inf Syst 50(2):231–264

    Article  Google Scholar 

  4. Rout JK, Singh S, Jena SK, Bakshi S (2019) Deceptive review detection using labeled and unlabeled data. Multimedia Tools Appl 76(3):3187–3211

    Article  Google Scholar 

  5. Krishnaveni N, Radha V (2019) Feature selection algorithms for data mining classification: a survey. Indian J Sci Technol 12(6). https://doi.org/10.17485/ijst/2019/v12i6/139581

  6. Towards Data Science, https://towardsdatascience.com

  7. Martinez-Torres MR, Toral SL (2019) A machine learning approach for the identification of the deceptive reviews in the hospitality sector using unique attributes and sentiment orientation. Tour Manage 75:393–403

    Article  Google Scholar 

  8. Ahmed H, Traore I, Saad S (2017) Detection of online fake news using N-gram analysis and machine learning techniques. In: International conference on intelligent, secure, and dependable systems in distributed and cloud environments. Springer, Cham, pp 127–138

    Google Scholar 

  9. Catal C, Guldan S (2017) Product review management software based on multiple classifiers. IET Software 11(3):89–92

    Article  Google Scholar 

  10. Rout JK, Dalmia A, Choo KKR, Bakshi S, Jena SK (2017) Revisiting semi-supervised learning for online deceptive review detection. IEEE Access 5:1319–1327

    Article  Google Scholar 

  11. Sun C, Du Q, Tian G (2016) Exploiting product related review features for fake review detection. Math Probl Eng

    Google Scholar 

  12. Wu Z, Cao J, Wang Y, Wang Y, Zhang L, Wu J (2018) hPSD: a hybrid PU-learning-based spammer detection model for product reviews. IEEE Trans Cybern

    Google Scholar 

  13. Shao Y, Trovati M, Shi Q, Angelopoulou O, Asimakopoulou E, Bessis N (2017) A hybrid spam detection method based on unstructured datasets. Soft Comput 21(1):233–243

    Article  Google Scholar 

  14. Osman AH, Aljahdali HM (2017) Feature weight optimization mechanism for email spam detection based on two-step clustering algorithm and logistic regression method. Int J Adv Comput Sci Appl (IJACSA) 8(10):420–429

    Google Scholar 

  15. Elssied NOF, Ibrahim O, Osman AH (2015) Enhancement of spam detection mechanism based on hybrid k-mean clustering and support vector machine. Soft Comput 19(11):3237–3248

    Article  Google Scholar 

  16. Lu XY, Chen MS, Wu JL, Chang PC, Chen MH (2018) A novel ensemble decision tree based on under-sampling and clonal selection for web spam detection. Pattern Anal Appl 21(3):741–754

    Article  MathSciNet  Google Scholar 

  17. Pandey AC, Rajpoot DS (2019) Spam review detection using spiral cuckoo search clustering method. Evol Intel 12(2):147–164

    Article  Google Scholar 

  18. Nagwani NK, Sharaff A (2019) SMS spam filtering and thread identification using bi-level text classification and clustering techniques. J Inf Sci 43(1):75–87

    Article  Google Scholar 

  19. Chakraborty M, Pal S, Pramanik R, Chowdary CR (2016) Recent developments in social spam detection and combating techniques: a survey. Inf Process Manage 52(6):1053–1073

    Article  Google Scholar 

  20. Fontanarava J, Pasi G, Viviani M (2017) Feature analysis for fake review detection through supervised classification. In: 2017 IEEE international conference on data science and advanced analytics (DSAA). IEEE, pp 658–666

    Google Scholar 

  21. Abu Hammad AS (2014) An approach for detecting spam in arabic opinion reviews. An approach for detecting spam in arabic opinion reviews

    Google Scholar 

  22. Nair A, Phapale A, Yagnik V, Bathe K (2016) Opinion spam mining. Int Res J Eng Technol (IRJET) 3(4):1855–1859

    Google Scholar 

  23. Kiwi Data Science, https://kiwidatascience.com

  24. Kokate U, Deshpande A, Mahalle P, Patil P (2018) Data stream clustering techniques, applications, and models: comparative analysis and discussion. Big Data Cogn Comput 2(4):32

    Article  Google Scholar 

  25. Brown D, Japa A, Shi Y (2019) A fast density-grid based clustering method. In: 2019 IEEE 9th annual computing and communication workshop and conference (CCWC). IEEE, pp 0048–0054

    Google Scholar 

  26. Kafi A, Alam MSA, Hossain SB, Awal SB, Arif H (2019) Feature-based mobile phone rating using sentiment analysis and machine learning approaches. In: 2019 1st international conference on advances in science, engineering and robotics technology (ICASERT). IEEE, pp 1–6

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. Krishnaveni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Krishnaveni, N., Radha, V. (2021). Performance Evaluation of Clustering-Based Classification Algorithms for Detection of Online Spam Reviews. In: Jeena Jacob, I., Kolandapalayam Shanmugam, S., Piramuthu, S., Falkowski-Gilski, P. (eds) Data Intelligence and Cognitive Informatics. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-8530-2_20

Download citation

Publish with us

Policies and ethics

Navigation