Abstract
Blogs are interactive and regularly updated websites which can be seen as diaries. These websites are composed by articles based on distinct topics. Thus, it is necessary to develop Information Retrieval approaches for this new web knowledge. The first important step of this process is the categorization of the articles. The paper above compares several methods using linguistic knowledge with k-NN algorithm for automatic categorization of weblogs articles.
Chapter PDF
Similar content being viewed by others
References
Bergo, A. (2001). Text categorization and prototypes. Technical report.
Borko, H. et M. Bernick (1963). Automatic document classification. J. ACM 10(2), 151–162.
Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167.
Chen, C., F. Ibekwe-SanJuan, E. SanJuan, et C. Weaver (2006). Visual analysis of conflicting opinions. vast 0, 59–66.
Cormack, R. M. (1971). “A review of classification” (with discussion). the Royal Statistical Society 3, 321–367.
Cornuéjols, A. et L. Miclet (2002). “Apprentissage artificiel, Concepts et algorithme’s. Eyrolles.
Cover, T. et P. Hart (1967). Nearest neighbor pattern classification. Information Theory, IEEE Transactions on 13(1), 21–27.
Joachims, T. (1998). “Text categorization with support vector machines: learning with many relevant features”. In Proc. 10th European Conference on Machine Learning ECML-98, pp. 137–142.
Johnson, S. C. (1967). “Hierarchical clustering schemes”. Psychometrika 32, 241–254.
Lewis, D. D., Y. Yang, T. G. Rose, et F. Li (2004). “Rcv1: A new benchmark collection for text categorization research”. Journal of Machine Learning Research 5(Apr), 361–397.
Mcculloch, W. et W. Pitts (1943). “A logical calculus of the ideas immanent in nervous activity”. Bulletin of Mathematical Biophysics 5, 115–133.
Moulinier, I., G. Raskinis, et J. Ganascia (1996). “Text categorization: a symbolic approach”. In In Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval, pp. 87–99.
Quinlan, J. R. (1986). Induction of decision trees. Mach. Learn. 1(1), 81–106.
Quinlan, J. R. (1993). C4.5: programs for machine learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Schmid, H. (1995). “Improvements in part-of-speech tagging with an application to german”. In Proceedings of the ACL SIGDAT-Workshop, Dublin.
Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer, N.Y.
Weiss, S. M., N. Indurkhya, T. Zhang, et F. Damerau (2005). Text Mining: Predictive Methods for Analyzing Unstructured Information. Springer.
Yang, Y. (1999). An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval 1(1–2), 69–90.
Yang, Y. et X. Liu (1999). “A re-examination of text categorization methods”. In SIGIR ’99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, New York, NY, USA, pp. 42–49. ACM Press.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 IFIP International Federation for Information Processing
About this paper
Cite this paper
Bayoudh, I., Bechet, N., Roche, M. (2008). Blog Classification: Adding Linguistic Knowledge to Improve the K-NN Algorithm. In: Shi, Z., Mercier-Laurent, E., Leake, D. (eds) Intelligent Information Processing IV. IIP 2008. IFIP – The International Federation for Information Processing, vol 288. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-87685-6_10
Download citation
DOI: https://doi.org/10.1007/978-0-387-87685-6_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-87684-9
Online ISBN: 978-0-387-87685-6
eBook Packages: Computer ScienceComputer Science (R0)