Abstract
Cyberbullying and hate speech are two of the most significant problems in today’s cyberspace. Automated artificial intelligence models might be used to find and remove online hate speech, which would address a critical problem. A variety of explainable AI strategies are being developed to make model judgments and justifications intelligible to people as artificial intelligence continues to permeate numerous industries and make critical change. Our study focuses on mixed code languages (a mix of Hindi and English) and the Indian sub-continent. This language combination is extensively used in SARRAC nations. Three transformer-based models and one machine learning model was trained and fine-tuned on the modified HASOC-Identification of Conversational Hate-Speech in Code-Mixed Languages (ICHCL) data for hate speech classification. Several types of explainability techniques have been explored on the respective models, such as Local interpretable model-agnostic explanations (LIME), Shapley additive explanations (SHAP), and model attention, to analyze model behavior. The analysis suggests that better trained models and comparison of Explainable Artificial Intelligence (XAI) techniques would provide better insight.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Atanasova, P., Simonsen, J.G., Lioma, C., Augenstein, I.: A diagnostic study of explainability techniques for text classification. ar**v preprint ar**v:2009.13295 (2020)
Attanasio, G., Nozza, D., Pastor, E., Hovy, D.: Benchmarking post-hoc interpretability approaches for transformer-based misogyny detection. In: Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP, pp. 100–112 (2022)
Biradar, S., Saumya, S., et al.: Fighting hate speech from bilingual Hinglish speaker’s perspective, a transformer-and translation-based approach. Soc. Netw. Anal. Min. 12(1), 1–10 (2022)
Buitinck, L., et al.: API design for machine learning software: experiences from the scikit-learn project. In: ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pp. 108–122 (2013)
Camburu, O.M., Rocktäschel, T., Lukasiewicz, T., Blunsom, P.: e-SNLI: natural language inference with natural language explanations. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Cheng, J., Bernstein, M., Danescu-Niculescu-Mizil, C., Leskovec, J.: Anyone can become a troll: causes of trolling behavior in online discussions. In: Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing, pp. 1217–1230 (2017)
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. ar**v preprint ar**v:1911.02116 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. ar**v preprint ar**v:1810.04805 (2018)
Donthula, S.K., Kaushik, A.: Man is what he eats: a research on Hinglish sentiments of Youtube cookery channels using deep learning. Int. J. Recent Technol. Eng. (IJRTE) 8(2S11), 930–937 (2019)
Fersini, E., Nozza, D., Rosso, P., et al.: Overview of the evalita 2018 task on automatic misogyny identification (ami). In: EVALITA Evaluation of NLP and Speech Tools for Italian Proceedings of the Final Workshop 12–13 December 2018, Naples. Accademia University Press (2018)
Fersini, E., Nozza, D., Rosso, P., et al.: Ami@ evalita2020: automatic misogyny identification. In: Proceedings of the 7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA 2020). (seleziona...) (2020)
Hate crime: Law Commision united kingdom government. https://www.lawcom.gov.uk/project/hate-crime/. Accessed 02 Dec 2022
Kakwani, D., et al.: IndicNLPSuite: monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages. In: Findings of EMNLP (2020)
Karim, M.R., et al.: Deephateexplainer: explainable hate speech detection in under-resourced Bengali language. In: 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–10. IEEE (2021)
Kaur, G., Kaushik, A., Sharma, S.: Cooking is creating emotion: a study on Hinglish sentiments of Youtube cookery channels using semi-supervised approach. Big Data Cogn. Comput. 3(3), 37 (2019)
Kokatnoor, S.A., Krishnan, B.: Twitter hate speech detection using stacked weighted ensemble (SWE) model. In: 2020 Fifth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), pp. 87–92. IEEE (2020)
Kothari, R., Snell, R.: Chutnefying English: The Phenomenon of Hinglish. Penguin Books India (2011)
Kumar, R., Ojha, A.K., Malmasi, S., Zampieri, M.: Evaluating aggression identification in social media. In: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pp. 1–5 (2020)
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Mandl, T., Modha, S., Kumar, M.A., Chakravarthi, B.R.: Overview of the HASOC track at fire 2020: hate speech and offensive language identification in Tamil, Malayalam, Hindi, English and German. In: Forum for Information Retrieval Evaluation, pp. 29–32 (2020)
Mathew, B., Saha, P., Yimam, S.M., Biemann, C., Goyal, P., Mukherjee, A.: Hatexplain: a benchmark dataset for explainable hate speech detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 14867–14875 (2021)
Moreno, M.A., Gower, A.D., Brittain, H., Vaillancourt, T.: Applying natural language processing to evaluate news media coverage of bullying and cyberbullying. Prev. Sci. 20(8), 1274–1283 (2019)
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?” explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)
Satapara, S., et al.: Overview of the HASOC subtrack at fire 2022: hate speech and offensive content identification in English and Indo-Aryan languages. In: Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation, pp. 4–7 (2022)
Satapara, S., Modha, S., Mandl, T., Madhu, H., Majumder, P.: Overview of the HASOC subtrack at FIRE 2021: conversational hate speech detection in code-mixed language. Working Notes FIRE (2021)
Vig, J.: A multiscale visualization of attention in the transformer model. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 37–42. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/P19-3007, https://www.aclweb.org/anthology/P19-3007
Vijayaraghavan, P., Larochelle, H., Roy, D.: Interpretable multi-modal hate speech detection. ar**v preprint ar**v:2103.01616 (2021)
Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the Second Workshop on Language in Social Media, pp. 19–26 (2012)
Waseem, Z., Hovy, D.: Hateful symbols or hateful people? predictive features for hate speech detection on Twitter. In: Proceedings of the NAACL Student Research Workshop, pp. 88–93 (2016)
Yadav, S., Kaushik, A.: Contextualized embeddings from transformers for sentiment analysis on code-mixed Hinglish data: an expanded approach with explainable artificial intelligence. In: Anand Kumar, M., et al. (eds.) SPELLL 2022. Communications in Computer and Information Science, vol. 1802, pp. 99–119. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-33231-9_7
Yadav, S., Kaushik, A., Sharma, S.: Cooking well, with love, is an art: transformers on Youtube Hinglish data. In: 2021 International Conference on Computational Performance Evaluation (ComPE), pp. 836–841. IEEE (2021)
Yang, C., Srinivasan, P.: Translating surveys to surveillance on social media: methodological challenges & solutions. In: Proceedings of the 2014 ACM Conference on Web Science, pp. 4–12 (2014)
Zaidan, O., Eisner, J., Piatko, C.: Using “annotator rationales” to improve machine learning for text categorization. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pp. 260–267 (2007)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yadav, S., Kaushik, A., McDaid, K. (2023). Understanding Interpretability: Explainable AI Approaches for Hate Speech Classifiers. In: Longo, L. (eds) Explainable Artificial Intelligence. xAI 2023. Communications in Computer and Information Science, vol 1903. Springer, Cham. https://doi.org/10.1007/978-3-031-44070-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-44070-0_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44069-4
Online ISBN: 978-3-031-44070-0
eBook Packages: Computer ScienceComputer Science (R0)