Blog Classification: Adding Linguistic Knowledge to Improve the K-NN Algorithm

Bayoudh, Ines; Bechet, Nicolas; Roche, Mathieu

doi:10.1007/978-0-387-87685-6_10

Ines Bayoudh^4,5,
Nicolas Bechet⁵ &
Mathieu Roche⁵

Part of the book series: IFIP – The International Federation for Information Processing ((IFIPAICT,volume 288))

Included in the following conference series:

International Conference on Intelligent Information Processing

899 Accesses
2 Citations

Abstract

Blogs are interactive and regularly updated websites which can be seen as diaries. These websites are composed by articles based on distinct topics. Thus, it is necessary to develop Information Retrieval approaches for this new web knowledge. The first important step of this process is the categorization of the articles. The paper above compares several methods using linguistic knowledge with k-NN algorithm for automatic categorization of weblogs articles.

Download to read the full chapter text

Chapter PDF

Label Micro-blog Topics Using the Bayesian Inference Method

Semantic Similarity and Word-Net Based Web News Classification

Large scale biomedical texts classification: a kNN and an ESA-based approaches

Article Open access 16 June 2016

Keywords

References

Bergo, A. (2001). Text categorization and prototypes. Technical report.
Google Scholar
Borko, H. et M. Bernick (1963). Automatic document classification. J. ACM 10(2), 151–162.
Article MATH Google Scholar
Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167.
Article Google Scholar
Chen, C., F. Ibekwe-SanJuan, E. SanJuan, et C. Weaver (2006). Visual analysis of conflicting opinions. vast 0, 59–66.
Google Scholar
Cormack, R. M. (1971). “A review of classification” (with discussion). the Royal Statistical Society 3, 321–367.
Article MathSciNet Google Scholar
Cornuéjols, A. et L. Miclet (2002). “Apprentissage artificiel, Concepts et algorithme’s. Eyrolles.
Google Scholar
Cover, T. et P. Hart (1967). Nearest neighbor pattern classification. Information Theory, IEEE Transactions on 13(1), 21–27.
Article MATH Google Scholar
Joachims, T. (1998). “Text categorization with support vector machines: learning with many relevant features”. In Proc. 10th European Conference on Machine Learning ECML-98, pp. 137–142.
Google Scholar
Johnson, S. C. (1967). “Hierarchical clustering schemes”. Psychometrika 32, 241–254.
Article Google Scholar
Lewis, D. D., Y. Yang, T. G. Rose, et F. Li (2004). “Rcv1: A new benchmark collection for text categorization research”. Journal of Machine Learning Research 5(Apr), 361–397.
Google Scholar
Mcculloch, W. et W. Pitts (1943). “A logical calculus of the ideas immanent in nervous activity”. Bulletin of Mathematical Biophysics 5, 115–133.
Article MathSciNet MATH Google Scholar
Moulinier, I., G. Raskinis, et J. Ganascia (1996). “Text categorization: a symbolic approach”. In In Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval, pp. 87–99.
Google Scholar
Quinlan, J. R. (1986). Induction of decision trees. Mach. Learn. 1(1), 81–106.
Google Scholar
Quinlan, J. R. (1993). C4.5: programs for machine learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Google Scholar
Schmid, H. (1995). “Improvements in part-of-speech tagging with an application to german”. In Proceedings of the ACL SIGDAT-Workshop, Dublin.
Google Scholar
Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer, N.Y.
Book MATH Google Scholar
Weiss, S. M., N. Indurkhya, T. Zhang, et F. Damerau (2005). Text Mining: Predictive Methods for Analyzing Unstructured Information. Springer.
Google Scholar
Yang, Y. (1999). An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval 1(1–2), 69–90.
Article Google Scholar
Yang, Y. et X. Liu (1999). “A re-examination of text categorization methods”. In SIGIR ’99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, New York, NY, USA, pp. 42–49. ACM Press.
Google Scholar

Download references

Author information

Authors and Affiliations

Centre Urbain Nord, Université du 7 Novembre à Carthage, Tunis, Tunisie
Ines Bayoudh
LIRMM UMR 5506, CNRS Université Montpellier 2, France
Ines Bayoudh, Nicolas Bechet & Mathieu Roche

Authors

Ines Bayoudh
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Bechet
View author publications
You can also search for this author in PubMed Google Scholar
Mathieu Roche
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, China
Zhongzhi Shi
MODEME, IAE Research Center, Lyon University, France
E. Mercier-Laurent
Indiana University, USA
D. Leake

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bayoudh, I., Bechet, N., Roche, M. (2008). Blog Classification: Adding Linguistic Knowledge to Improve the K-NN Algorithm. In: Shi, Z., Mercier-Laurent, E., Leake, D. (eds) Intelligent Information Processing IV. IIP 2008. IFIP – The International Federation for Information Processing, vol 288. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-87685-6_10

Download citation

DOI: https://doi.org/10.1007/978-0-387-87685-6_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-87684-9
Online ISBN: 978-0-387-87685-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Blog Classification: Adding Linguistic Knowledge to Improve the K-NN Algorithm

Abstract

Chapter PDF

Similar content being viewed by others

Label Micro-blog Topics Using the Bayesian Inference Method

Semantic Similarity and Word-Net Based Web News Classification

Large scale biomedical texts classification: a kNN and an ESA-based approaches

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Blog Classification: Adding Linguistic Knowledge to Improve the K-NN Algorithm

Abstract

Chapter PDF

Similar content being viewed by others

Label Micro-blog Topics Using the Bayesian Inference Method

Semantic Similarity and Word-Net Based Web News Classification

Large scale biomedical texts classification: a kNN and an ESA-based approaches

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation