Corpus Creation in Telugu: Sentiment Classification Using Ensemble Approaches

Chattu, Kannaiah; Sumathi, D.

doi:10.1007/s42979-023-02314-x

Corpus Creation in Telugu: Sentiment Classification Using Ensemble Approaches

Original Research
Published: 08 November 2023

Volume 4, article number 860, (2023)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Kannaiah Chattu¹ &
D. Sumathi¹

67 Accesses
Explore all metrics

Abstract

Social media sentiment analysis focuses on consumer posts in online communities. It helps firms track brand sentiment to determine whether customers have favorable or unfavorable feelings about their goods and services. Revolutions are happening in social media, as many companies are adopting innovative tools, technologies, and products for various applications. Analyzing these tools and providing opinions on them is considered a best practice that is being followed all over the world. Expressing sentiments in the English language is universal, yet providing sentiments in a regional language could be of great help to people who are well versed in their languages. Researchers are paying more attention to various regional languages in India. Analyzing sentiments in regional languages is a complex task as it requires a standard corpus. Telugu is a regional language with a wealth of data readily available on social media, but finding class labels of sentences for Telugu Sentiment Analysis is difficult. The objective of this work is to devise a framework based on the supervised learning principle. The work has been done by constructing the corpus named "SentiKanna," which comprises multiple domains such as recipes, tourism, and movies written in Telugu script. The SentiKanna corpus contains a collection of Telugu sentences extracted from various sources and manually annotated according to our annotation guidelines. Annotation for 3113 reviews in all domains has been completed. Once the corpus is constructed, the binary classification of a given review is evaluated. We have tested all machine learning algorithms and obtained positive results. Among the various methods, Ensemble produces the best results. Our work provides a comparative analysis of the Random Forest (RF) and Extreme Gradient Boosting (XGB) models, and finally, RF outperforms the XGB model in all domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on sentiment analysis methods, applications, and challenges

Article 07 February 2022

A review on sentiment analysis and emotion detection from text

Article 28 August 2021

Sentiment Analysis in the Age of Generative AI

Article Open access 05 March 2024

Data availability

The data that support the findings of this study are available on request from the authors.

References

Naidu R, 2018.Building sentiphrasenet for sentiment analysis in Telugu, council international conference 15^th India
Liu and Bing. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies. 2012; pp. 1–167
Naidu R, Bharti SK, Babu KS, Mohapatra RK. 2017. Sentiment analysis using Telugu sentiwordnet.
TeluguRank.Ethnologue list. [Online]. Available: https://www.ethnologue.com/statistics/size
Naidu R, Bharti SK, Babu KS, Mohapatra RK. Sentiment analysis using Telugu SentiWordNet," 2017 International conference on wireless communications Signal P
Suryachandra P, Reddy PVS. Statistical approaches in parsing for Telugu language. 2016 International conference on communication and electronic systems [5 and Electronics Systems (ICCES), Coimbatore. 2016; pp. 1–5.
Bharti SK, Naidu R, Babu KS. Dynamic sentiphrasenet to support sentiment analysis in Telugu "Mathematical modeling, computational intelligence techniques and renewable energy: Proceedings of the first international conference, MMCITRE 2020 and electronics systems (ICCES),
Taboada, M. Brooke, J. Tofiloski, M. Voll, K. Stede, M. Lexicon-based methods for sentiment analysis. 2011. Available online: http://direct.mit.edu/coli/article-pdf/37/2/267/1798865/coli_a_00049.pdf (Accessed on 25 July 2022).
Jain PK, Pamula R, Srivastava G. A systematic literature review on machine learning applications for consumer sentiment analysis using online reviews. Comput Sci Rev. 2021. https://doi.org/10.1016/j.cosrev.2021.100413.
Article Google Scholar
Kausar S, Huahu X, Ahmad W, Yasir Shabir MY, Ahmad W. A Sentiment polarity categorization technique for online product reviews, IEEE Access; 2017.
Gangula RRR, Mamidi R. Resource creation towards automated sentiment analysis in Telugu (a low resource language) and integrating multiple domain sources to enhance sentiment prediction. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018), Paris, France. European Language Resources Association (ELRA). 2018
Mukku SS, Choudhary N, Mamidi R. Enhanced sentiment classification of Telugu text using ML techniques. In: Proceedings of the 4th workshop on sentiment analysis where AI meets psychology (SAAIP 2016) collocated with 25th international joint conference on artificial intelligence (IJCAI 2016), New York City, USA, July 10, 2016; pages 29–34.
Mukku SS, Mamidi R. Actsa: annotated corpus for Telugu sentiment analysis. In: Proceedings of the first workshop on building linguistically generalizable NLP systems. 2017; pages 54–58.
Parupalli S, Rao VA, Mamidi R. Bcsat: a benchmark corpus for sentiment analysis in Telugu using word-level annotations. Proceedings of ACL 2018, student research workshop. 2018, pp. 99–104, association for computational linguistics 3
Boddupalli S, Saranya AS, Mundra U, Dasam P. Sentiment analysis of Telugu data and comparing advanced ensemble techniques using different text processing methods. 5th international conference on computing communication control and automation (ICCUBEA). 2019
Tammina S. A Hybrid Learning approach for sentiment classification in Telugu Language. Int Conf on Artif Intell Signal Process (AISP)2020
Garapati A, Bora N, Balla H, Sai M. SentiPhraseNet: an extended SentiWordNet approach for Telugu sentiment analysis. Int J Adv Res Ideas Innovations Technol. 2019; ISSN: 2454–132X
Jonnalagadda P, Hari KP, Batha S, Boyina H, 2019. A rule based sentiment analysis in Telugu. Int J Adv Res Ideas Innovations Technol
Amitava Das and Sivaji Bandyopadhyay. 2010. Sentiwordnet for Indian languages. In Proceedings of the Eighth Workshop on Asian Language Resouces, pages 56–63.9
Das A, Bandyopadhyay S. Dr Sentiment knows everything! In Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: systems demonstrations, pages 50–55. Association for Computational Linguistics. 2011
Choudhary N, Singh R, Bindlish I, Shrivastava M. 2018b. Sentiment analysis of code-mixed languages leveraging resource rich languages. ar**v preprint ar**v:1804.0080 7
Shalini K, Ravikurnar A, Vineetha RC, Aravinda RD, Annd KM, Soman KS. Sentiment analysis of Indian languages using convolutional neural networks. Int Conf Comput Commun Inf (ICCCI). 2018; pp. 1–4
Padmaja S, Bandu S, Fatima SS. Text processing of Telugu-English code mixed languages. Int Conf Emerg Trends Eng. 2019;3:147–55.
Google Scholar
Supriya BN, Akki CB. Sentiment prediction using enhanced XGBoost and tailored random forest. Int J Comput Digital Syst. 2021;10(1)
Khan MdRFHA, Afroz US, Masum AKM, Sentiment analysis from Bengali depression dataset using machine 11th ICCCNT 2020, IEEE – 49239
Afifahad K, Yulitabd IN, Sarathancd I, Sentiment analysis on telemedicine app reviews using XGBoost Classifier, 978–1–6654–0890–5/21/$31.00 ©2021 IEEE
Gaye B, Wulamu A. Sentimental analysis for online reviews using machine learning algorithms. Int Res J Eng Technol (IRJET). 2019;06(08):2395–456.
Google Scholar
Singh J, Singh G, Singh R. Optimization of sentiment analysis using machine learning classifiers. Hum Cent Comput Inf Sci. 2017;7:32. https://doi.org/10.1186/s13673-017-0116-.
Article Google Scholar
AlAmrania Y, Lazaarb M, El Kadiria KE. Random forest and support vector machine based hybrid approach to sentiment analysis. Procedia Comput Sci. 2018;127:511–20.
Article Google Scholar
Yanxiong S, Yeli L, Quingtao Z, Application research of text classification based on random forest algorithm. 3^rd International conference on advance electronic materials. 2020
Li H, Zhao J, Susn Y. XGBoost model and its application to personal credit evaluation. IEEE Intell Syst. 2020;35:52–61.
Article Google Scholar
Candice Bente jaca, Anna Csorgob , Gonzalo Martınez-Munoz. A comparative analysis of XG boost, (PDF) a comparative analysis of XGBoost (researchgate.net). 2019
Bahrawi B. Sentiment analysis using random forest algorithm online social media. J Inf Technol Utilization. 2019;2:29–33.
Article Google Scholar
Manukonda D, Kodali R, Guduri D. Phrase based heuristic sentiment analyzer for the Telugu language. J Emerg Technol Innov Res (JETIR). 2019;6:245–51.
Google Scholar
Choudhary N, Singh R, Bindlish I, Shrivastava M. Emotions are universal: learning sentiment based representations of resource-poor languages using siamese networks. ar**v preprint. 2018a. ar**v:1804.00805.6
Le V, Mikolov T. Distributed representations of sentences and documents. Ar**v preprint. 2014. ar**v:1405.4053.8
Narayanan V, Arora I, Bhatia A. Fast and accurate sentiment classification using an enhanced naive bayes model. In: Intelligent data engineering and automated learning–IDEAL 2013. Springer; 2013. p. 194–201.
Chapter Google Scholar
Wilson T, Wiebe J, Hoffmann P. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the conference on human language technology and empirical methods in natural language processing. 2005. 347–354. Association for Computational Linguistics 17

Download references

Acknowledgements

The authors thank VIT-AP University, the editors, and the anonymous reviewers for their insightful comments and recommendations during the review process.

Funding

No funding received for this research.

Author information

Authors and Affiliations

School of Computer Science and Engineering, VIT-AP University, Amaravati, Andhra Pradesh, India
Kannaiah Chattu & D. Sumathi

Authors

Kannaiah Chattu
View author publications
You can also search for this author in PubMed Google Scholar
D. Sumathi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kannaiah Chattu.

Ethics declarations

Conflict of Interest

No conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances in Computational Approaches for Image Processing, Wireless Networks, Cloud Applications and Network Security” guest edited by P. Raviraj, Maode Ma and Roopashree H R.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chattu, K., Sumathi, D. Corpus Creation in Telugu: Sentiment Classification Using Ensemble Approaches. SN COMPUT. SCI. 4, 860 (2023). https://doi.org/10.1007/s42979-023-02314-x

Download citation

Received: 23 August 2023
Accepted: 09 September 2023
Published: 08 November 2023
DOI: https://doi.org/10.1007/s42979-023-02314-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Corpus Creation in Telugu: Sentiment Classification Using Ensemble Approaches

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

A review on sentiment analysis and emotion detection from text

Sentiment Analysis in the Age of Generative AI

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Corpus Creation in Telugu: Sentiment Classification Using Ensemble Approaches

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

A review on sentiment analysis and emotion detection from text

Sentiment Analysis in the Age of Generative AI

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation