Log in

Corpus Creation in Telugu: Sentiment Classification Using Ensemble Approaches

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Social media sentiment analysis focuses on consumer posts in online communities. It helps firms track brand sentiment to determine whether customers have favorable or unfavorable feelings about their goods and services. Revolutions are happening in social media, as many companies are adopting innovative tools, technologies, and products for various applications. Analyzing these tools and providing opinions on them is considered a best practice that is being followed all over the world. Expressing sentiments in the English language is universal, yet providing sentiments in a regional language could be of great help to people who are well versed in their languages. Researchers are paying more attention to various regional languages in India. Analyzing sentiments in regional languages is a complex task as it requires a standard corpus. Telugu is a regional language with a wealth of data readily available on social media, but finding class labels of sentences for Telugu Sentiment Analysis is difficult. The objective of this work is to devise a framework based on the supervised learning principle. The work has been done by constructing the corpus named "SentiKanna," which comprises multiple domains such as recipes, tourism, and movies written in Telugu script. The SentiKanna corpus contains a collection of Telugu sentences extracted from various sources and manually annotated according to our annotation guidelines. Annotation for 3113 reviews in all domains has been completed. Once the corpus is constructed, the binary classification of a given review is evaluated. We have tested all machine learning algorithms and obtained positive results. Among the various methods, Ensemble produces the best results. Our work provides a comparative analysis of the Random Forest (RF) and Extreme Gradient Boosting (XGB) models, and finally, RF outperforms the XGB model in all domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The data that support the findings of this study are available on request from the authors.

References

  1. Naidu R, 2018.Building sentiphrasenet for sentiment analysis in Telugu, council international conference 15th India

  2. Liu and Bing. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies. 2012; pp. 1–167

  3. Naidu R, Bharti SK, Babu KS, Mohapatra RK. 2017. Sentiment analysis using Telugu sentiwordnet.

  4. TeluguRank.Ethnologue list. [Online]. Available: https://www.ethnologue.com/statistics/size

  5. Naidu R, Bharti SK, Babu KS, Mohapatra RK. Sentiment analysis using Telugu SentiWordNet," 2017 International conference on wireless communications Signal P

  6. Suryachandra P, Reddy PVS. Statistical approaches in parsing for Telugu language. 2016 International conference on communication and electronic systems [5 and Electronics Systems (ICCES), Coimbatore. 2016; pp. 1–5.

  7. Bharti SK, Naidu R, Babu KS. Dynamic sentiphrasenet to support sentiment analysis in Telugu "Mathematical modeling, computational intelligence techniques and renewable energy: Proceedings of the first international conference, MMCITRE 2020 and electronics systems (ICCES),

  8. Taboada, M. Brooke, J. Tofiloski, M. Voll, K. Stede, M. Lexicon-based methods for sentiment analysis. 2011. Available online: http://direct.mit.edu/coli/article-pdf/37/2/267/1798865/coli_a_00049.pdf (Accessed on 25 July 2022).

  9. Jain PK, Pamula R, Srivastava G. A systematic literature review on machine learning applications for consumer sentiment analysis using online reviews. Comput Sci Rev. 2021. https://doi.org/10.1016/j.cosrev.2021.100413.

    Article  Google Scholar 

  10. Kausar S, Huahu X, Ahmad W, Yasir Shabir MY, Ahmad W. A Sentiment polarity categorization technique for online product reviews, IEEE Access; 2017.

  11. Gangula RRR, Mamidi R. Resource creation towards automated sentiment analysis in Telugu (a low resource language) and integrating multiple domain sources to enhance sentiment prediction. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018), Paris, France. European Language Resources Association (ELRA). 2018

  12. Mukku SS, Choudhary N, Mamidi R. Enhanced sentiment classification of Telugu text using ML techniques. In: Proceedings of the 4th workshop on sentiment analysis where AI meets psychology (SAAIP 2016) collocated with 25th international joint conference on artificial intelligence (IJCAI 2016), New York City, USA, July 10, 2016; pages 29–34.

  13. Mukku SS, Mamidi R. Actsa: annotated corpus for Telugu sentiment analysis. In: Proceedings of the first workshop on building linguistically generalizable NLP systems. 2017; pages 54–58.

  14. Parupalli S, Rao VA, Mamidi R. Bcsat: a benchmark corpus for sentiment analysis in Telugu using word-level annotations. Proceedings of ACL 2018, student research workshop. 2018, pp. 99–104, association for computational linguistics 3

  15. Boddupalli S, Saranya AS, Mundra U, Dasam P. Sentiment analysis of Telugu data and comparing advanced ensemble techniques using different text processing methods. 5th international conference on computing communication control and automation (ICCUBEA). 2019

  16. Tammina S. A Hybrid Learning approach for sentiment classification in Telugu Language. Int Conf on Artif Intell Signal Process (AISP)2020

  17. Garapati A, Bora N, Balla H, Sai M. SentiPhraseNet: an extended SentiWordNet approach for Telugu sentiment analysis. Int J Adv Res Ideas Innovations Technol. 2019; ISSN: 2454–132X

  18. Jonnalagadda P, Hari KP, Batha S, Boyina H, 2019. A rule based sentiment analysis in Telugu. Int J Adv Res Ideas Innovations Technol

  19. Amitava Das and Sivaji Bandyopadhyay. 2010. Sentiwordnet for Indian languages. In Proceedings of the Eighth Workshop on Asian Language Resouces, pages 56–63.9

  20. Das A, Bandyopadhyay S. Dr Sentiment knows everything! In Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: systems demonstrations, pages 50–55. Association for Computational Linguistics. 2011

  21. Choudhary N, Singh R, Bindlish I, Shrivastava M. 2018b. Sentiment analysis of code-mixed languages leveraging resource rich languages. ar**v preprint ar**v:1804.0080 7

  22. Shalini K, Ravikurnar A, Vineetha RC, Aravinda RD, Annd KM, Soman KS. Sentiment analysis of Indian languages using convolutional neural networks. Int Conf Comput Commun Inf (ICCCI). 2018; pp. 1–4

  23. Padmaja S, Bandu S, Fatima SS. Text processing of Telugu-English code mixed languages. Int Conf Emerg Trends Eng. 2019;3:147–55.

    Google Scholar 

  24. Supriya BN, Akki CB. Sentiment prediction using enhanced XGBoost and tailored random forest. Int J Comput Digital Syst. 2021;10(1)

  25. Khan MdRFHA, Afroz US, Masum AKM, Sentiment analysis from Bengali depression dataset using machine 11th ICCCNT 2020, IEEE – 49239

  26. Afifahad K, Yulitabd IN, Sarathancd I, Sentiment analysis on telemedicine app reviews using XGBoost Classifier, 978–1–6654–0890–5/21/$31.00 ©2021 IEEE

  27. Gaye B, Wulamu A. Sentimental analysis for online reviews using machine learning algorithms. Int Res J Eng Technol (IRJET). 2019;06(08):2395–456.

    Google Scholar 

  28. Singh J, Singh G, Singh R. Optimization of sentiment analysis using machine learning classifiers. Hum Cent Comput Inf Sci. 2017;7:32. https://doi.org/10.1186/s13673-017-0116-.

    Article  Google Scholar 

  29. AlAmrania Y, Lazaarb M, El Kadiria KE. Random forest and support vector machine based hybrid approach to sentiment analysis. Procedia Comput Sci. 2018;127:511–20.

    Article  Google Scholar 

  30. Yanxiong S, Yeli L, Quingtao Z, Application research of text classification based on random forest algorithm. 3rd International conference on advance electronic materials. 2020

  31. Li H, Zhao J, Susn Y. XGBoost model and its application to personal credit evaluation. IEEE Intell Syst. 2020;35:52–61.

    Article  Google Scholar 

  32. Candice Bente jaca, Anna Csorgob , Gonzalo Martınez-Munoz. A comparative analysis of XG boost, (PDF) a comparative analysis of XGBoost (researchgate.net). 2019

  33. Bahrawi B. Sentiment analysis using random forest algorithm online social media. J Inf Technol Utilization. 2019;2:29–33.

    Article  Google Scholar 

  34. Manukonda D, Kodali R, Guduri D. Phrase based heuristic sentiment analyzer for the Telugu language. J Emerg Technol Innov Res (JETIR). 2019;6:245–51.

    Google Scholar 

  35. Choudhary N, Singh R, Bindlish I, Shrivastava M. Emotions are universal: learning sentiment based representations of resource-poor languages using siamese networks. ar**v preprint. 2018a. ar**v:1804.00805.6

  36. Le V, Mikolov T. Distributed representations of sentences and documents. Ar**v preprint. 2014. ar**v:1405.4053.8

  37. Narayanan V, Arora I, Bhatia A. Fast and accurate sentiment classification using an enhanced naive bayes model. In: Intelligent data engineering and automated learning–IDEAL 2013. Springer; 2013. p. 194–201.

    Chapter  Google Scholar 

  38. Wilson T, Wiebe J, Hoffmann P. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the conference on human language technology and empirical methods in natural language processing. 2005. 347–354. Association for Computational Linguistics 17

Download references

Acknowledgements

The authors thank VIT-AP University, the editors, and the anonymous reviewers for their insightful comments and recommendations during the review process.

Funding

No funding received for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kannaiah Chattu.

Ethics declarations

Conflict of Interest

No conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances in Computational Approaches for Image Processing, Wireless Networks, Cloud Applications and Network Security” guest edited by P. Raviraj, Maode Ma and Roopashree H R.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chattu, K., Sumathi, D. Corpus Creation in Telugu: Sentiment Classification Using Ensemble Approaches. SN COMPUT. SCI. 4, 860 (2023). https://doi.org/10.1007/s42979-023-02314-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-023-02314-x

Keywords

Navigation