A Novel Approach for Text Classification Using Feature Selection Algorithm and Term Weight Measures

  • Conference paper
  • First Online:
Accelerating Discoveries in Data Science and Artificial Intelligence I (ICDSAI 2023)

Abstract

Text classification is a method for determining the class label of an unknown textual document. In text classification, the vector representation of a document plays a crucial role in enhancing the efficiency of classification process. Several approaches of text classification use content-based features like words for document vector representation. Words with high distinguishing capability increase the performance of the text classification. Therefore, recognizing such words from a huge number of words is an essential step in text classification. This problem of high dimensional is solved with the help of feature selection methods. In the literature, several feature selection methods are proposed by the researchers based on the information of term distributions in various classes of dataset. In this chapter, we developed an approach for text classification (TC) by combining feature selection algorithm (FSA) and term weight measures (TWMs), in which a new feature selection method is developed to delete redundant features and for selecting relevant features. The recognized features are utilized for expressing the documents as vectors. The value of term in representation of vector is calculated by using TWM. In the proposed approach, a new Term Weight Measure is developed and compared the performance of proposed TWM with several well-known TWMs. Six different classification algorithms namely support vector machine (SVM), decision tree (DT), Naïve Bayes (NB), k-nearest neighbour (KNN), logistic regression (LR), and random forest (RF) are used for generating the model for classification. The experiment is performed on six benchmark datasets in the field of TC. The results showed that the proposed approach showed best accuracies for TC on six datasets compared with different works in the domain of TC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 181.89
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
EUR 235.39
Price includes VAT (Germany)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. H. Zhao, A.P. Sinha, W. Ge, Effects of feature construction on classification performance: An empirical study in bank failure prediction. Expert Syst. Appl. 36(2), 2633–2644 (2009)

    Article  Google Scholar 

  2. A. Onan, Ensemble learning based feature selection with an application to text classification (2018 26th Signal Processing and Communications Applications Conference (SIU), 2018). https://doi.org/10.1109/siu.2018.8404258

    Book  Google Scholar 

  3. A. Onan, On the Performance of Ensemble Learning for Automated Diagnosis of Breast Cancer (Artificial Intelligence Perspectives and Applications, 2015), pp. 119–129. https://doi.org/10.1007/978-3-319-18476-0_13

    Book  Google Scholar 

  4. R. Cekik, A.K. Uysal, A novel filter feature selection method using rough set for short text data. Expert Syst. Appl. 160(113691), 1–15 (2020)

    Google Scholar 

  5. M. Labani, P. Moradi, F. Ahmadizar, M. Jalili, A novel multivariate filter method for feature selection in text classification problems. Eng. Appl. Artif. Intell. 70, 25 (2018)

    Article  Google Scholar 

  6. M. Labani, P. Moradi, M. Jalili, A multi-objective genetic algorithm for text feature selection using the relative discriminative criterion. Expert Syst. Appl. 149(113276), 1–21 (2020)

    Google Scholar 

  7. T. Dogana, A.K. Uysal, A novel term weighting scheme for text classification: TF-MONO. J. Informet.

    Google Scholar 

  8. L. Chen, L. Jiang, C. Li, Modified DFS-based term weighting scheme for text classification. Expert Syst. Appl. 168, 114438 (2021)

    Article  Google Scholar 

  9. T. Dogan, A.K. Uysal, Improved inverse gravity moment term weighting for text classification. Expert Syst. Appl. 130, 45–59 (2019)

    Article  Google Scholar 

  10. Z. Tang, W. Li, W. Yan Li, S.L. Zhao, Several alternative term weighting methods for text representation and classification. Knowl.-Based Syst. 207, 106399 (2020)

    Article  Google Scholar 

  11. J. Chen, P.K. Kudjo, S. Mensah, S.A. Brown, G. Akorfu, An automatic software vulnerability classification framework using term frequency-inverse gravity moment and feature selection. J. Syst. Software 167(110616), 1–20 (2020)

    Google Scholar 

  12. https://pan.webis.de/clef21/pan21-web/author-profiling.html

  13. https://www.kaggle.com/clmentbisaillon/fake-and-real-news-dataset

  14. https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews

  15. K. Lang. (2008, January). 20 Newsgroups. Available: http://qwone.com/~jason/20Newsgroups/

  16. ComeToMyHead. (2004, January 2018). AG’s Corpus of News Articles. Available: https://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html

  17. https://www.kaggle.com/vikassingh1996/news-clickbait-dataset

  18. L. Breiman, Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  19. B. Schölkopf, C.J. Burges, Advances in Kernel Methods: Support Vector Learning (MIT press, 1999)

    Google Scholar 

  20. T. Pranckevičius, V. Marcinkevičius, Comparison of naive bayes, random forest, decision tree, support vector machines, and logistic regression classifiers for text reviews classification. Baltic J. Modern Comput. 5(2), 221 (2017)

    Article  Google Scholar 

  21. G.V. Kass, An exploratory technique for investigating large quantities of categorical data. Appl. Stat. 29, 119–127 (1980)

    Article  Google Scholar 

  22. J.R. Quinlan, Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Article  Google Scholar 

  23. T.M. Cover, P.E. Hart, Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)

    Article  Google Scholar 

  24. G. Salton, A. Wong, C. Yang, A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  Google Scholar 

  25. M. Lan, C. Tan, J. Su, Y. Lu, Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 721–735 (2009)

    Article  Google Scholar 

  26. F. Ren, M.G. Sohrab, Class-indexing-based term weighting for automatic text classification. Inf. Sci. 236, 109–125 (2013). https://doi.org/10.1016/j.ins.2013.02.029

    Article  Google Scholar 

  27. Y. Liu, H.T. Loh, A. Sun, Imbalanced text classification: A term weighting approach. Expert Syst. Appl. 36(1), 690–701 (2009). https://doi.org/10.1016/j.eswa.2007.10.042

    Article  Google Scholar 

  28. K. Chen, Z. Zhang, J. Long, H. Zhang, Turning from tf-idf to tf-igm for term weighting in text classification. Expert Syst. Appl. 66, 1339–1351 (2016)

    Article  Google Scholar 

  29. Mohamed Abdel Fattah, New term weighting schemes with combination of multiple classifiers for sentiment analysis. Neurocomputing (2015). https://doi.org/10.1016/j.neucom.2015.04.051i

  30. F. Carvalho, G.P. Guedes, TF-IDFC-RF: A Novel Supervised Term Weighting Scheme for Sentiment Analysis (ar**v:2003.07193v2 [cs.IR] 12, Aug, 2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Palacharla, R.K., Vatsavayi, V.K. (2024). A Novel Approach for Text Classification Using Feature Selection Algorithm and Term Weight Measures. In: Lin, F.M., Patel, A., Kesswani, N., Sambana, B. (eds) Accelerating Discoveries in Data Science and Artificial Intelligence I. ICDSAI 2023. Springer Proceedings in Mathematics & Statistics, vol 421. Springer, Cham. https://doi.org/10.1007/978-3-031-51167-7_28

Download citation

Publish with us

Policies and ethics

Navigation