Detecting Internet Hidden Paid Posters Based on Group and Individual Characteristics

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2015 (WISE 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9419))

Included in the following conference series:

Abstract

Online social networks are popular communication tools for billions of users. Unfortunately, they are also effective tools for hidden paid posters (or Internet water army in some literatures) to propagate spam or mendacious messages. Paid posters are typically organized in groups to post with specific purposes and have flooded the communities of microblogging websites. Typical traditional methods only utilize individual characteristics in detecting them. In this paper, we study the group characteristics of paid posters and find that group characteristics are also very important in detecting them comparing to individual characteristics. We construct a classifier based on both the individual and group characteristics to detect paid posters. Extensive experiments show that our method is better than existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://digi.163.com/14/0919/15/A6H2KS8H00162OUT.html.

  2. 2.

    MapReduce: http://en.wikipedia.org/wiki/MapReduce.

  3. 3.

    Apache Hadoop: http://en.wikipedia.org/wiki/Apache_Hadoop.

  4. 4.

    ICTCLAS: http://ictclas.org/index.html.

  5. 5.

    SINA Weibo: http://www.weibo.com/.

References

  1. Sina weibo. http://en.wikipedia.org/wiki/sina_weibo, June 2014

  2. Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Spyropoulos, C.D.: An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 160–167. ACM (2000)

    Google Scholar 

  3. Benevenuto, F., Duarte, F., Rodrigues, T., Almeida, V.A., Almeida, J.M., Ross, K.W.: Understanding video interactions in youtube. In: Proceedings of the 16th ACM international conference on Multimedia, pp. 761–764. ACM (2008)

    Google Scholar 

  4. Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on twitter. In: Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS), vol. 6, p. 12 (2010)

    Google Scholar 

  5. Benevenuto, F., Rodrigues, T., Almeida, V., Almeida, J., Zhang, C., Ross, K.: Identifying video spammers in online social networks. In: Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web, pp. 45–52. ACM (2008)

    Google Scholar 

  6. Blanzieri, E., Bryl, A.: A survey of learning-based techniques of email spam filtering. Artif. Intell. Rev. 29(1), 63–92 (2008)

    Article  Google Scholar 

  7. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)

    Article  Google Scholar 

  8. Chen, C., Wu, K., Srinivasan, V., Zhang, X.: Battling the internet water army: detection of hidden paid posters. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 116–120. ACM (2013)

    Google Scholar 

  9. Chu, Z., Gianvecchio, S., Wang, H., Jajodia, S.: Who is tweeting on twitter: human, bot, or cyborg? In: Proceedings of the 26th Annual Computer Security Applications Conference, pp. 21–30. ACM (2010)

    Google Scholar 

  10. Ding, Z., Jia, Y., Zhou, B., Han, Y.: Mining topical influencers based on the multi-relational network in micro-blogging sites. China Commun. 10(1), 93–104 (2013)

    Article  Google Scholar 

  11. Drucker, H., Wu, S., Vapnik, V.N.: Support vector machines for spam categorization. IEEE Trans. Neural Netw. 10(5), 1048–1054 (1999)

    Article  Google Scholar 

  12. Fetterly, D., Manasse, M., Najork, M.: Spam, damn spam, and statistics: using statistical analysis to locate spam web pages. In: Proceedings of the 7th International Workshop on the Web and Databases: Colocated with ACM SIGMOD/PODS 2004, pp. 1–6. ACM (2004)

    Google Scholar 

  13. Gao, H., Hu, J., Wilson, C., Li, Z., Chen, Y., Zhao, B.Y.: Detecting and characterizing social spam campaigns. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, pp. 35–47. ACM (2010)

    Google Scholar 

  14. Grier, C., Thomas, K., Paxson, V., Zhang, M.: @ spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, pp. 27–37. ACM (2010)

    Google Scholar 

  15. Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating web spam with trustrank. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB Endowment, vol. 30, pp. 576–587 (2004)

    Google Scholar 

  16. **dal, N., Liu, B.: Opinion spam and analysis. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 219–230. ACM (2008)

    Google Scholar 

  17. Kolari, P., Java, A., Finin, T., Oates, T., Joshi, A.: Detecting spam blogs: a machine learning approach. In: Proceedings of the National Conference on Artificial Intelligence, vol. 21, pp. 1351. AAAI Press, Menlo Park, CA (1999), MIT Press, Cambridge, London, MA (2006)

    Google Scholar 

  18. Lee, K., Caverlee, J., Webb, S.: Uncovering social spammers: social honeypots+ machine learning. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 435–442. ACM (2010)

    Google Scholar 

  19. McCord, M., Chuah, M.: Spam detection on twitter using traditional classifiers. In: Alcaraz Calero, J.M., Yang, L.T., Mármol, F.G., Villalba, L.J.G., Li, A.X., Wang, Y. (eds.) ATC 2011. LNCS, vol. 6906, pp. 175–186. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  20. Ott, M., Choi, Y., Cardie, C., Hancock, J.T.: Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 309–319. Association for Computational Linguistics (2011)

    Google Scholar 

  21. Salton, G., Wong, A., Yang, C.-S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  22. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)

    Article  Google Scholar 

  23. Thomas, K., Grier, C., Song, D., Paxson, V.: Suspended accounts in retrospect: an analysis of twitter spam. In: Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, pp. 243–258. ACM (2011)

    Google Scholar 

  24. Thomason, A.: Blog spam: a review. In: CEAS (2007)

    Google Scholar 

  25. Wang, K., **ao, Y., **ao, Z.: Detection of internet water army in social network. In: 2014 International Conference on Computer, Communications and Information Technology (CCIT 2014). Atlantis Press (2014)

    Google Scholar 

  26. Zhang, Y., Ruan, X., Wang, H., Wang, H.: What scale of audience a campaign can reach in what price. In: 2014 IEEE International Conference on Computer Communications (InfoCOM 2014) (2014)

    Google Scholar 

  27. Yang, C., Harkreader, R., Zhang, J., Shin, S., Gu,G.: Analyzing spammers’ social networks for fun and profit: a case study of cyber criminal ecosystem on twitter. In: Proceedings of the 21st International Conference on World Wide Web, pp. 71–80. ACM (2012)

    Google Scholar 

  28. Zeng, K., Wang, X., Zhang, Q., Zhang, X., Wang, F.-Y.: Behavior modeling of internet water army in online forums. World Congr. 19, 9858–9863 (2014)

    Google Scholar 

Download references

Acknowledgments

This work was supported by 973 Program of China (Grant No. 2013CB329601, 2013CB329602, 2013CB329604), NSFC of China (Grant No. 60933005, 91124002), 863 Program of China (Grant No. 2012AA01A401, 2012AA01A402), National Key Technology RD Program of China (Grant No. 2012BAH38B04, 2012BAH38B06).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to **ang Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, X., Zhou, B., Jia, Y., Li, S. (2015). Detecting Internet Hidden Paid Posters Based on Group and Individual Characteristics. In: Wang, J., et al. Web Information Systems Engineering – WISE 2015. WISE 2015. Lecture Notes in Computer Science(), vol 9419. Springer, Cham. https://doi.org/10.1007/978-3-319-26187-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26187-4_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26186-7

  • Online ISBN: 978-3-319-26187-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation