Abstract
Social media sentiment analysis focuses on consumer posts in online communities. It helps firms track brand sentiment to determine whether customers have favorable or unfavorable feelings about their goods and services. Revolutions are happening in social media, as many companies are adopting innovative tools, technologies, and products for various applications. Analyzing these tools and providing opinions on them is considered a best practice that is being followed all over the world. Expressing sentiments in the English language is universal, yet providing sentiments in a regional language could be of great help to people who are well versed in their languages. Researchers are paying more attention to various regional languages in India. Analyzing sentiments in regional languages is a complex task as it requires a standard corpus. Telugu is a regional language with a wealth of data readily available on social media, but finding class labels of sentences for Telugu Sentiment Analysis is difficult. The objective of this work is to devise a framework based on the supervised learning principle. The work has been done by constructing the corpus named "SentiKanna," which comprises multiple domains such as recipes, tourism, and movies written in Telugu script. The SentiKanna corpus contains a collection of Telugu sentences extracted from various sources and manually annotated according to our annotation guidelines. Annotation for 3113 reviews in all domains has been completed. Once the corpus is constructed, the binary classification of a given review is evaluated. We have tested all machine learning algorithms and obtained positive results. Among the various methods, Ensemble produces the best results. Our work provides a comparative analysis of the Random Forest (RF) and Extreme Gradient Boosting (XGB) models, and finally, RF outperforms the XGB model in all domains.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available on request from the authors.
References
Naidu R, 2018.Building sentiphrasenet for sentiment analysis in Telugu, council international conference 15th India
Liu and Bing. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies. 2012; pp. 1–167
Naidu R, Bharti SK, Babu KS, Mohapatra RK. 2017. Sentiment analysis using Telugu sentiwordnet.
TeluguRank.Ethnologue list. [Online]. Available: https://www.ethnologue.com/statistics/size
Naidu R, Bharti SK, Babu KS, Mohapatra RK. Sentiment analysis using Telugu SentiWordNet," 2017 International conference on wireless communications Signal P
Suryachandra P, Reddy PVS. Statistical approaches in parsing for Telugu language. 2016 International conference on communication and electronic systems [5 and Electronics Systems (ICCES), Coimbatore. 2016; pp. 1–5.
Bharti SK, Naidu R, Babu KS. Dynamic sentiphrasenet to support sentiment analysis in Telugu "Mathematical modeling, computational intelligence techniques and renewable energy: Proceedings of the first international conference, MMCITRE 2020 and electronics systems (ICCES),
Taboada, M. Brooke, J. Tofiloski, M. Voll, K. Stede, M. Lexicon-based methods for sentiment analysis. 2011. Available online: http://direct.mit.edu/coli/article-pdf/37/2/267/1798865/coli_a_00049.pdf (Accessed on 25 July 2022).
Jain PK, Pamula R, Srivastava G. A systematic literature review on machine learning applications for consumer sentiment analysis using online reviews. Comput Sci Rev. 2021. https://doi.org/10.1016/j.cosrev.2021.100413.
Kausar S, Huahu X, Ahmad W, Yasir Shabir MY, Ahmad W. A Sentiment polarity categorization technique for online product reviews, IEEE Access; 2017.
Gangula RRR, Mamidi R. Resource creation towards automated sentiment analysis in Telugu (a low resource language) and integrating multiple domain sources to enhance sentiment prediction. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018), Paris, France. European Language Resources Association (ELRA). 2018
Mukku SS, Choudhary N, Mamidi R. Enhanced sentiment classification of Telugu text using ML techniques. In: Proceedings of the 4th workshop on sentiment analysis where AI meets psychology (SAAIP 2016) collocated with 25th international joint conference on artificial intelligence (IJCAI 2016), New York City, USA, July 10, 2016; pages 29–34.
Mukku SS, Mamidi R. Actsa: annotated corpus for Telugu sentiment analysis. In: Proceedings of the first workshop on building linguistically generalizable NLP systems. 2017; pages 54–58.
Parupalli S, Rao VA, Mamidi R. Bcsat: a benchmark corpus for sentiment analysis in Telugu using word-level annotations. Proceedings of ACL 2018, student research workshop. 2018, pp. 99–104, association for computational linguistics 3
Boddupalli S, Saranya AS, Mundra U, Dasam P. Sentiment analysis of Telugu data and comparing advanced ensemble techniques using different text processing methods. 5th international conference on computing communication control and automation (ICCUBEA). 2019
Tammina S. A Hybrid Learning approach for sentiment classification in Telugu Language. Int Conf on Artif Intell Signal Process (AISP)2020
Garapati A, Bora N, Balla H, Sai M. SentiPhraseNet: an extended SentiWordNet approach for Telugu sentiment analysis. Int J Adv Res Ideas Innovations Technol. 2019; ISSN: 2454–132X
Jonnalagadda P, Hari KP, Batha S, Boyina H, 2019. A rule based sentiment analysis in Telugu. Int J Adv Res Ideas Innovations Technol
Amitava Das and Sivaji Bandyopadhyay. 2010. Sentiwordnet for Indian languages. In Proceedings of the Eighth Workshop on Asian Language Resouces, pages 56–63.9
Das A, Bandyopadhyay S. Dr Sentiment knows everything! In Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: systems demonstrations, pages 50–55. Association for Computational Linguistics. 2011
Choudhary N, Singh R, Bindlish I, Shrivastava M. 2018b. Sentiment analysis of code-mixed languages leveraging resource rich languages. ar**v preprint ar**v:1804.0080 7
Shalini K, Ravikurnar A, Vineetha RC, Aravinda RD, Annd KM, Soman KS. Sentiment analysis of Indian languages using convolutional neural networks. Int Conf Comput Commun Inf (ICCCI). 2018; pp. 1–4
Padmaja S, Bandu S, Fatima SS. Text processing of Telugu-English code mixed languages. Int Conf Emerg Trends Eng. 2019;3:147–55.
Supriya BN, Akki CB. Sentiment prediction using enhanced XGBoost and tailored random forest. Int J Comput Digital Syst. 2021;10(1)
Khan MdRFHA, Afroz US, Masum AKM, Sentiment analysis from Bengali depression dataset using machine 11th ICCCNT 2020, IEEE – 49239
Afifahad K, Yulitabd IN, Sarathancd I, Sentiment analysis on telemedicine app reviews using XGBoost Classifier, 978–1–6654–0890–5/21/$31.00 ©2021 IEEE
Gaye B, Wulamu A. Sentimental analysis for online reviews using machine learning algorithms. Int Res J Eng Technol (IRJET). 2019;06(08):2395–456.
Singh J, Singh G, Singh R. Optimization of sentiment analysis using machine learning classifiers. Hum Cent Comput Inf Sci. 2017;7:32. https://doi.org/10.1186/s13673-017-0116-.
AlAmrania Y, Lazaarb M, El Kadiria KE. Random forest and support vector machine based hybrid approach to sentiment analysis. Procedia Comput Sci. 2018;127:511–20.
Yanxiong S, Yeli L, Quingtao Z, Application research of text classification based on random forest algorithm. 3rd International conference on advance electronic materials. 2020
Li H, Zhao J, Susn Y. XGBoost model and its application to personal credit evaluation. IEEE Intell Syst. 2020;35:52–61.
Candice Bente jaca, Anna Csorgob , Gonzalo Martınez-Munoz. A comparative analysis of XG boost, (PDF) a comparative analysis of XGBoost (researchgate.net). 2019
Bahrawi B. Sentiment analysis using random forest algorithm online social media. J Inf Technol Utilization. 2019;2:29–33.
Manukonda D, Kodali R, Guduri D. Phrase based heuristic sentiment analyzer for the Telugu language. J Emerg Technol Innov Res (JETIR). 2019;6:245–51.
Choudhary N, Singh R, Bindlish I, Shrivastava M. Emotions are universal: learning sentiment based representations of resource-poor languages using siamese networks. ar**v preprint. 2018a. ar**v:1804.00805.6
Le V, Mikolov T. Distributed representations of sentences and documents. Ar**v preprint. 2014. ar**v:1405.4053.8
Narayanan V, Arora I, Bhatia A. Fast and accurate sentiment classification using an enhanced naive bayes model. In: Intelligent data engineering and automated learning–IDEAL 2013. Springer; 2013. p. 194–201.
Wilson T, Wiebe J, Hoffmann P. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the conference on human language technology and empirical methods in natural language processing. 2005. 347–354. Association for Computational Linguistics 17
Acknowledgements
The authors thank VIT-AP University, the editors, and the anonymous reviewers for their insightful comments and recommendations during the review process.
Funding
No funding received for this research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
No conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Advances in Computational Approaches for Image Processing, Wireless Networks, Cloud Applications and Network Security” guest edited by P. Raviraj, Maode Ma and Roopashree H R.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chattu, K., Sumathi, D. Corpus Creation in Telugu: Sentiment Classification Using Ensemble Approaches. SN COMPUT. SCI. 4, 860 (2023). https://doi.org/10.1007/s42979-023-02314-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-023-02314-x