Collaborative spam filtering based on incremental ontology learning

Pham, Xuan Hau; Lee, Nam-Hee; Jung, Jason J.; Sadeghi-Niaraki, Abolghasem

doi:10.1007/s11235-011-9513-5

Collaborative spam filtering based on incremental ontology learning

Published: 18 June 2011

Volume 52, pages 693–700, (2013)
Cite this article

Telecommunication Systems Aims and scope Submit manuscript

Xuan Hau Pham¹,
Nam-Hee Lee²,
Jason J. Jung¹ &
…
Abolghasem Sadeghi-Niaraki³

254 Accesses
4 Citations
Explore all metrics

Abstract

Spam mail filtering is a classic problem to automatically recognize irrelevance between incoming emails and user contexts. This paper proposes a novel proxy server architecture for (i) collaboratively integrating useful features sent from personal email clients. (ii) Improving the filtering performance of SMTP servers. Given a set of spam mails marked by multiple email users, the proxy server can extract two kinds of textual features, which are apriori terms and concept terms based on key phrases. More importantly, by taking into account the semantics and statistical associations, the proxy can aggregate them in a hierarchical cluster structure. As a result, spam ontology can be built, and also, incrementally enriched. Hence, the email clients can be supported to improve their performances of spam filtering by referring to the semantic information from the ontology. For evaluating the proposed system, we have collected a large number of spam mails within a same intranet environment. The system has shown 17.4% lower error rate of filtering than the single email clients.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Reading: Addison-Wesley.
Google Scholar
Boone, G. (1998). Concept features in re:agent, an intelligent email agent. In Proceedings of the second international conference on autonomous agents (pp. 141–148).
Chapter Google Scholar
Cohen, W. W. (1996). Learning rules that classify e-mail. In Proceeding of the AAAI spring symposium on machine learning in information access (pp. 18–25).
Google Scholar
Delany, S. J., & Cunningham, P. (2004). An analysis of case-base editing in a spam filtering system. In Lecture notes in computer science: Vol. 3155. Proceedings of the 7th European conference on case-based reasoning (pp. 128–141). Berlin: Springer.
Chapter Google Scholar
Drucker, H., Wu, D., & Vapnik, V. (1999). Support vector machines for spam categorization. IEEE Transactions on Neural Networks, 10(5), 1048–1054.
Article Google Scholar
Eyharabide, V., & Amandi, A. (2008). Semantic spam filtering from personalized ontologies. Journal of Web Engineering, 7(2), 158–176.
Google Scholar
Fdez-Riverola, F., Iglesias, E. L., Díaz, F., Méndez, J. R., & Corchado, J. M. (2007). Spamhunting: An instance-based reasoning system for spam labelling and filtering. Decision Support Systems, 43(4), 722–736.
Article Google Scholar
Ferber, J. (1999). Multi-agent systems—an introduction to distributed artificial intelligence. Reading: Addison-Wesley.
Google Scholar
Giraud-Carrier, C. (2000). A note on the utility of incremental learning. AI Communications, 13(4), 215–223.
Google Scholar
Gordillo, J., & Conde, E. (2007). An hmm for detecting spam mail. Expert Systems With Applications, 33(3), 667–682.
Article Google Scholar
Gruber, T. R. (1993). A translation approach to portable ontologies. Knowledge Acquisition, 5(2), 199–220.
Article Google Scholar
Joachims, T. (1997). A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In Proceedings of 14th international conference on machine learning (pp. 143–151). San Mateo: Morgan Kaufmann.
Google Scholar
Jung, J. J. (2005). Collaborative web browsing based on semantic extraction of user interests with bookmarks. Journal of Universal Computer Science, 11(2), 213–228.
Google Scholar
Jung, J. J. (2008). Ontology-based context synchronization for ad-hoc social collaborations. Knowledge-Based Systems, 21(7), 573–580.
Article Google Scholar
Jung, J. J. (2008). Query transformation based on semantic centrality in semantic social network. Journal of Universal Computer Science, 14(7), 1031–1047.
Google Scholar
Jung, J. J. (2008). Taxonomy alignment for interoperability between heterogeneous virtual organizations. Expert Systems With Applications, 34(4), 2721–2731.
Article Google Scholar
Jung, J. J. (2009). Semantic business process integration based on ontology alignment. Expert Systems With Applications, 36(8), 11013–11020.
Article Google Scholar
Jung, J. J. (2009). Social grid platform for collaborative online learning on blogosphere: a case study of eLearning@BlogGrid. Expert Systems With Applications, 36(2), 2177–2186.
Article Google Scholar
Jung, J. J. (2010). Ontology map** composition for query transformation on distributed environments. Expert Systems With Applications, 37(12), 8401–8405.
Article Google Scholar
Jung, J. J. (2010). Reusing ontology map**s for query segmentation and routing in semantic peer-to-peer environment. Information Sciences, 180(17), 3248–3257.
Article Google Scholar
Jung, J. J. (2010). On sustainability of context-aware services among heterogeneous smart spaces. Journal of Universal Computer Science, 16(13), 1745–1760.
Google Scholar
Jung, J. J. (2011). Service chain-based business alliance formation in service-oriented architecture. Expert Systems With Applications, 38(3), 2206–2211.
Article Google Scholar
Kim, H. J., Kim, H. N., Jung, J. J., & Jo, G. (2004). Spam mail filtering system using semantic enrichment. In Lecture notes in computer science: Vol. 3306. Proceedings of the 5th international conference on web information systems engineering (pp. 619–628).
Google Scholar
Koprinska, I., Poon, J., Clark, J., & Chan, J. (2007). Learning to classify e-mail. Information Sciences, 177(10), 2167–2187.
Article Google Scholar
Maes, P. (1994). Agents that reduce work and information overload. Communications of the ACM, 37(7), 31–40.
Article Google Scholar
Metzger, J., Schillo, M., & Fischer, K. (2003). A multiagent-based peer-to-peer network in java for distributed spam filtering. In Lecture notes in computer science: Vol. 2691. Proceedings of the 3rd international central and eastern European conference on multi-agent systems (pp. 616–625).
Google Scholar
Moon, J., Shon, T., Seo, J. T., Kim, J., & Seo, J. (2004). An approach for spam e-mail detection with support vector machine and n-gram indexing. In Lecture notes in computer science: Vol. 3280. Proceedings of the 19th international symposium on computer and information sciences (pp. 351–362).
Google Scholar
Ollerenshaw, Z. (2000). Spam, spam, spam, spam…. Computer Fraud & Security, 20, 13–14.
Google Scholar
Pampapathi, R., Mirkin, B., & Levene, M. (2006). A suffix tree approach to anti-spam email filtering. Machine Learning, 65(1), 309–338.
Article Google Scholar
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
Google Scholar
Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo: Morgan Kaufmann.
Google Scholar
Sahami, M., Dumais, S., Heckerman, D., & Horvitz, E. (1998). A bayesian approach to filtering junk e-mail. In Proceeding of the AAAI workshop on learning for text classification.
Google Scholar
Schapire, R. E., & Singer, Y. (2000). Boostexter: A boosting-based system for text categorization. Machine Learning, 39(2–3), 135–168.
Article Google Scholar
Trudgian, D. C. (2004). Spam classification using nearest neighbour techniques. In Lecture notes in computer science: Vol. 3177. Proceedings of the 5th international conference on intelligent data engineering and automated learning (pp. 578–585).
Google Scholar
Turney, P. D. (2000). Learning algorithms for keyphrase extraction. Information Retrieval, 2(4), 303–336.
Article Google Scholar
Weiss, G. (1999). Multiagent systems—a modern approach to distributed artificial intelligence. Cambridge: MIT Press.
Google Scholar
Yu, B., & Singh, M. P. (2000). A social mechanism of reputation management in electronic communities. In Lecture notes in computer science: Vol. 1860. Proceedings of the 4th international workshop on cooperative information agents (pp. 154–165).
Google Scholar
Zhou, Y., Mulekar, M. S., & Nerellapalli, P. (2007). Adaptive spam filtering using dynamic feature spaces. International Journal on Artificial Intelligence Tools, 16(4), 627–646.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Yeungnam University, Dae-Dong, Gyeongsan, 712-749, Korea
Xuan Hau Pham & Jason J. Jung
College of Business Administration, Sogang University, Seoul, Korea
Nam-Hee Lee
Department of Geoinformatic Engineering, Inha University, Incheon, Korea
Abolghasem Sadeghi-Niaraki

Authors

Xuan Hau Pham
View author publications
You can also search for this author in PubMed Google Scholar
Nam-Hee Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jason J. Jung
View author publications
You can also search for this author in PubMed Google Scholar
Abolghasem Sadeghi-Niaraki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jason J. Jung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pham, X.H., Lee, NH., Jung, J.J. et al. Collaborative spam filtering based on incremental ontology learning. Telecommun Syst 52, 693–700 (2013). https://doi.org/10.1007/s11235-011-9513-5

Download citation

Published: 18 June 2011
Issue Date: February 2013
DOI: https://doi.org/10.1007/s11235-011-9513-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Collaborative spam filtering based on incremental ontology learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Supervised Approach for Spam Detection Using Text-Based Semantic Representation

Detecting spam web pages using content and link-based techniques

A Collaborative Abstraction Based Email Spam Filtering with Fingerprints

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Collaborative spam filtering based on incremental ontology learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Supervised Approach for Spam Detection Using Text-Based Semantic Representation

Detecting spam web pages using content and link-based techniques

A Collaborative Abstraction Based Email Spam Filtering with Fingerprints

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation