Log in

Collaborative spam filtering based on incremental ontology learning

  • Published:
Telecommunication Systems Aims and scope Submit manuscript

Abstract

Spam mail filtering is a classic problem to automatically recognize irrelevance between incoming emails and user contexts. This paper proposes a novel proxy server architecture for (i) collaboratively integrating useful features sent from personal email clients. (ii) Improving the filtering performance of SMTP servers. Given a set of spam mails marked by multiple email users, the proxy server can extract two kinds of textual features, which are apriori terms and concept terms based on key phrases. More importantly, by taking into account the semantics and statistical associations, the proxy can aggregate them in a hierarchical cluster structure. As a result, spam ontology can be built, and also, incrementally enriched. Hence, the email clients can be supported to improve their performances of spam filtering by referring to the semantic information from the ontology. For evaluating the proposed system, we have collected a large number of spam mails within a same intranet environment. The system has shown 17.4% lower error rate of filtering than the single email clients.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Reading: Addison-Wesley.

    Google Scholar 

  2. Boone, G. (1998). Concept features in re:agent, an intelligent email agent. In Proceedings of the second international conference on autonomous agents (pp. 141–148).

    Chapter  Google Scholar 

  3. Cohen, W. W. (1996). Learning rules that classify e-mail. In Proceeding of the AAAI spring symposium on machine learning in information access (pp. 18–25).

    Google Scholar 

  4. Delany, S. J., & Cunningham, P. (2004). An analysis of case-base editing in a spam filtering system. In Lecture notes in computer science: Vol. 3155. Proceedings of the 7th European conference on case-based reasoning (pp. 128–141). Berlin: Springer.

    Chapter  Google Scholar 

  5. Drucker, H., Wu, D., & Vapnik, V. (1999). Support vector machines for spam categorization. IEEE Transactions on Neural Networks, 10(5), 1048–1054.

    Article  Google Scholar 

  6. Eyharabide, V., & Amandi, A. (2008). Semantic spam filtering from personalized ontologies. Journal of Web Engineering, 7(2), 158–176.

    Google Scholar 

  7. Fdez-Riverola, F., Iglesias, E. L., Díaz, F., Méndez, J. R., & Corchado, J. M. (2007). Spamhunting: An instance-based reasoning system for spam labelling and filtering. Decision Support Systems, 43(4), 722–736.

    Article  Google Scholar 

  8. Ferber, J. (1999). Multi-agent systems—an introduction to distributed artificial intelligence. Reading: Addison-Wesley.

    Google Scholar 

  9. Giraud-Carrier, C. (2000). A note on the utility of incremental learning. AI Communications, 13(4), 215–223.

    Google Scholar 

  10. Gordillo, J., & Conde, E. (2007). An hmm for detecting spam mail. Expert Systems With Applications, 33(3), 667–682.

    Article  Google Scholar 

  11. Gruber, T. R. (1993). A translation approach to portable ontologies. Knowledge Acquisition, 5(2), 199–220.

    Article  Google Scholar 

  12. Joachims, T. (1997). A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In Proceedings of 14th international conference on machine learning (pp. 143–151). San Mateo: Morgan Kaufmann.

    Google Scholar 

  13. Jung, J. J. (2005). Collaborative web browsing based on semantic extraction of user interests with bookmarks. Journal of Universal Computer Science, 11(2), 213–228.

    Google Scholar 

  14. Jung, J. J. (2008). Ontology-based context synchronization for ad-hoc social collaborations. Knowledge-Based Systems, 21(7), 573–580.

    Article  Google Scholar 

  15. Jung, J. J. (2008). Query transformation based on semantic centrality in semantic social network. Journal of Universal Computer Science, 14(7), 1031–1047.

    Google Scholar 

  16. Jung, J. J. (2008). Taxonomy alignment for interoperability between heterogeneous virtual organizations. Expert Systems With Applications, 34(4), 2721–2731.

    Article  Google Scholar 

  17. Jung, J. J. (2009). Semantic business process integration based on ontology alignment. Expert Systems With Applications, 36(8), 11013–11020.

    Article  Google Scholar 

  18. Jung, J. J. (2009). Social grid platform for collaborative online learning on blogosphere: a case study of eLearning@BlogGrid. Expert Systems With Applications, 36(2), 2177–2186.

    Article  Google Scholar 

  19. Jung, J. J. (2010). Ontology map** composition for query transformation on distributed environments. Expert Systems With Applications, 37(12), 8401–8405.

    Article  Google Scholar 

  20. Jung, J. J. (2010). Reusing ontology map**s for query segmentation and routing in semantic peer-to-peer environment. Information Sciences, 180(17), 3248–3257.

    Article  Google Scholar 

  21. Jung, J. J. (2010). On sustainability of context-aware services among heterogeneous smart spaces. Journal of Universal Computer Science, 16(13), 1745–1760.

    Google Scholar 

  22. Jung, J. J. (2011). Service chain-based business alliance formation in service-oriented architecture. Expert Systems With Applications, 38(3), 2206–2211.

    Article  Google Scholar 

  23. Kim, H. J., Kim, H. N., Jung, J. J., & Jo, G. (2004). Spam mail filtering system using semantic enrichment. In Lecture notes in computer science: Vol. 3306. Proceedings of the 5th international conference on web information systems engineering (pp. 619–628).

    Google Scholar 

  24. Koprinska, I., Poon, J., Clark, J., & Chan, J. (2007). Learning to classify e-mail. Information Sciences, 177(10), 2167–2187.

    Article  Google Scholar 

  25. Maes, P. (1994). Agents that reduce work and information overload. Communications of the ACM, 37(7), 31–40.

    Article  Google Scholar 

  26. Metzger, J., Schillo, M., & Fischer, K. (2003). A multiagent-based peer-to-peer network in java for distributed spam filtering. In Lecture notes in computer science: Vol. 2691. Proceedings of the 3rd international central and eastern European conference on multi-agent systems (pp. 616–625).

    Google Scholar 

  27. Moon, J., Shon, T., Seo, J. T., Kim, J., & Seo, J. (2004). An approach for spam e-mail detection with support vector machine and n-gram indexing. In Lecture notes in computer science: Vol. 3280. Proceedings of the 19th international symposium on computer and information sciences (pp. 351–362).

    Google Scholar 

  28. Ollerenshaw, Z. (2000). Spam, spam, spam, spam…. Computer Fraud & Security, 20, 13–14.

    Google Scholar 

  29. Pampapathi, R., Mirkin, B., & Levene, M. (2006). A suffix tree approach to anti-spam email filtering. Machine Learning, 65(1), 309–338.

    Article  Google Scholar 

  30. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.

    Google Scholar 

  31. Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo: Morgan Kaufmann.

    Google Scholar 

  32. Sahami, M., Dumais, S., Heckerman, D., & Horvitz, E. (1998). A bayesian approach to filtering junk e-mail. In Proceeding of the AAAI workshop on learning for text classification.

    Google Scholar 

  33. Schapire, R. E., & Singer, Y. (2000). Boostexter: A boosting-based system for text categorization. Machine Learning, 39(2–3), 135–168.

    Article  Google Scholar 

  34. Trudgian, D. C. (2004). Spam classification using nearest neighbour techniques. In Lecture notes in computer science: Vol. 3177. Proceedings of the 5th international conference on intelligent data engineering and automated learning (pp. 578–585).

    Google Scholar 

  35. Turney, P. D. (2000). Learning algorithms for keyphrase extraction. Information Retrieval, 2(4), 303–336.

    Article  Google Scholar 

  36. Weiss, G. (1999). Multiagent systems—a modern approach to distributed artificial intelligence. Cambridge: MIT Press.

    Google Scholar 

  37. Yu, B., & Singh, M. P. (2000). A social mechanism of reputation management in electronic communities. In Lecture notes in computer science: Vol. 1860. Proceedings of the 4th international workshop on cooperative information agents (pp. 154–165).

    Google Scholar 

  38. Zhou, Y., Mulekar, M. S., & Nerellapalli, P. (2007). Adaptive spam filtering using dynamic feature spaces. International Journal on Artificial Intelligence Tools, 16(4), 627–646.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason J. Jung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pham, X.H., Lee, NH., Jung, J.J. et al. Collaborative spam filtering based on incremental ontology learning. Telecommun Syst 52, 693–700 (2013). https://doi.org/10.1007/s11235-011-9513-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11235-011-9513-5

Keywords

Navigation