A Novel Hybrid Model of Word Embedding and Deep Learning to Identify Hate and Abusive Content on Social Media Platform

Kumar, Sachin; Bhagat, Ankit Kumar; Erugurala, Akash; Mirza, Amna; Jha, Alok Nikhil; Verma, Ajit Kumar

doi:10.1007/978-981-99-9836-4_4

Sachin Kumar¹⁰,
Ankit Kumar Bhagat¹⁰,
Akash Erugurala¹⁰,
Amna Mirza¹¹,
Alok Nikhil Jha¹² &
…
Ajit Kumar Verma¹³

Part of the book series: Frontiers of Artificial Intelligence, Ethics and Multidisciplinary Applications ((FAIE))

Included in the following conference series:

International Conference on Frontiers of Artificial Intelligence, Ethics, and Multidisciplinary Applications

234 Accesses

Abstract

The emergence of digital and their products, such as Facebook, Twitter, Reddit, Instagram, and other social media entities, has become the dominant medium for exchanging information, connecting people, expressing emotion and feelings, and influencing audiences and communities. Sometimes, these platforms are used by some social media users to engage in inappropriate behavior and expression by using offensive, hateful, and harassing content to express their views and dissatisfaction. Negative, hateful, and abusive content can hurt or harm other social media users and communities and can lead to law and order problems in society. There is a need to stop and mitigate the effects of hate speech by develo** intelligent systems and models to detect them. Artificial Intelligence/machine learning can help in the detection of social media posts, comments, and replies that are hateful, abusive, and offensive. Such content also correlated with race, gender, sexuality, religion, and age. The objective is to evaluate the performance of the proposed model in comparison to other machine learning and deep learning algorithms on the newly collected data. The experimental results demonstrate that the proposed model, CNN with word embeddings, exhibits outstanding performance in hate speech detection. It achieves a remarkable accuracy of 83%. CNN’s word embeddings’ contextual understanding of language and its ability to capture complex semantic relationships contribute significantly to its superior performance compared to other traditional machine learning and deep learning models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 149.79; Price includes VAT (Germany)

Hardcover Book: EUR 192.59; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arango A, Pérez J, Poblete B (2019) Hate speech detection is not as easy as you may think: a closer look at model validation. In: Proceedings of the 42nd international acm sigir conference on research and development in information retrieval. pp 45–54
Google Scholar
Battiti R (1992) First-and second-order methods for learning: between steepest descent and newton’s method. Neural Comput 4(2):141–166
Article Google Scholar
Bernard S, Heutte L, Adam S (2009) On the selection of decision trees in random forests. In: 2009 international joint conference on neural networks. IEEE, pp 302–307
Google Scholar
Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010: 19th international conference on computational StatisticsParis France, August 22–27, 2010 Keynote, Invited and contributed papers. Springer, pp 177–186
Google Scholar
Corazza M, Menini S, Cabrio E, Tonelli S, Villata S (2019) Cross-platform evaluation for italian hate speech detection. In: CLiC-it 2019-6th annual conference of the italian association for computational linguistics
Google Scholar
Davidson T, Warmsley D, Macy M, Weber I (2017) Automated hate speech detection and the problem of offensive language. In: Proceedings of the international AAAI conference on web and social media, vol 11, pp 512–515
Google Scholar
García V, Mollineda RA, Sánchez JS (2008) On the k-nn performance in a challenging scenario of imbalance and overlap**. Pattern Anal Appl 11:269–280
Article Google Scholar
Gongane VU, Munot MV, Anuse AD (2022) Detection and moderation of detrimental content on social media platforms: current status and future directions. Soc Netw Anal Mining 12(1):129
Article Google Scholar
Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J et al (2018) Recent advances in convolutional neural networks. Pattern Recogn 77:354–377
Article Google Scholar
Hilbe JM (2009) Logistic regression models. CRC Press
Google Scholar
Jahan MS, Oussalah M (2023) A systematic review of hate speech automatic detection using natural language processing. Neurocomputing 126232
Google Scholar
Kolhatkar V, Wu H, Cavasso L, Francis E, Shukla K, Taboada M (2020) The sfu opinion and comments corpus: a corpus for the analysis of online news comments. Corpus Pragmat 4:155–190
Article Google Scholar
Kumar S (2020) Ensemble-based extreme learning machine model for occupancy detection with ambient attributes. Int J Syst Assur Eng Manag 1–11
Google Scholar
Kumar S (2023) A novel hybrid machine learning model for prediction of co2 using socio-economic and energy attributes for climate change monitoring and mitigation policies. Ecolog Inf
Google Scholar
Kumar S, Kalia A, Sharma A (2018a) Predictive analysis of alertness related features for driver drowsiness detection. Adv Intell Syst Comput 736:368–377
Article Google Scholar
Kumar S, Nisha Z (2022) Does social media feed tell about your mental state? a deep randomised neural network approach. In: IEEE world congress on computational intelligence (WCCI). IEEE
Google Scholar
Kumar S, Pal S, Singh R (2018b) Intra elm variants ensemble based model to predict energy performance in residential buildings. Sustain Energy Grids Netw 16:177–187
Article Google Scholar
Kumar S, Pal KS, Singh R (2018c) A novel method based on extreme learning machine to predict heating and cooling load through design and structural attributes. Energy Build 176:275–286
Google Scholar
Kumar S, Panwar S (2022) icacd: an intelligent deep learning model to categorise current affairs news article for efficient journalistic process. Int J Syst Assur Eng Manag
Google Scholar
Kumar S, Saibal KP, Singh R (2019) A novel hybrid model based on particle swarm optimisation and extreme learning machine for short-term temperature prediction using ambient sensors. Sustain Cities Soc
Google Scholar
Kumar S, Sharma A, Reddy BK, Sachan S, Jain V (2021) An intelligent model based on integrated inverse document frequency and multinomial naive bayes for current affairs news categorisation. Int J Syst Assur Eng Manag
Google Scholar
Martins R, Gomes M, Almeida JJ, Novais P, Henriques P (2018) Hate speech classification in social media using emotional analysis. In: 2018 7th Brazilian conference on intelligent systems (BRACIS). IEEE, pp 61–66
Google Scholar
Mathur P, Shah R, Sawhney R, Mahata D (2018) Detecting offensive tweets in hindi-english code-switched language. In: Proceedings of the sixth international workshop on natural language processing for social media, pp 18–26
Google Scholar
Nobata C, Tetreault J, Thomas A, Mehdad Y, Chang Y (2016) Abusive language detection in online user content. In: Proceedings of the 25th international conference on world wide web, pp 145–153
Google Scholar
Olteanu A, Castillo C, Boy J, Varshney K (2018) The effect of extremist violence on hateful speech online. In: Proceedings of the international AAAI conference on web and social media, vol 12
Google Scholar
Qian J, Bethke A, Liu Y, Belding E, Wang WY (2019) A benchmark dataset for learning to intervene in online hate speech. ar**v:1909.04251
Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern 21(3):660–674
Article Google Scholar
Sherstinsky A (2020) Fundamentals of recurrent neural network (rnn) and long short-term memory (lstm) network. Physica D: Nonlinear Phenom 404:132306
Article Google Scholar
Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Proc Lett 9:293–300
Article Google Scholar
Vasist PN, Chatterjee D, Krishnan S (2023) The polarizing impact of political disinformation and hate speech: a cross-country configural narrative. Inf Syst Front 1–26
Google Scholar
Wilson RA, Land MK (2020) Hate speech on social media: content moderation in context. Conn Law Rev 52:1029
Google Scholar

Download references

Author information

Authors and Affiliations

Cluster Innovation Centre, University of Delhi, New Delhi, India
Sachin Kumar, Ankit Kumar Bhagat & Akash Erugurala
Saro**i Naidu Centre for Women Studies, Jamia Millia Islamia, New Delhi, India
Amna Mirza
Indraprastha Institute of Information Technology, New Delhi, India
Alok Nikhil Jha
Faculty of Engineering and Natural Sciences, Western Norway University of Applied Sciences, Haugesund, Norway
Ajit Kumar Verma

Authors

Sachin Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Ankit Kumar Bhagat
View author publications
You can also search for this author in PubMed Google Scholar
Akash Erugurala
View author publications
You can also search for this author in PubMed Google Scholar
Amna Mirza
View author publications
You can also search for this author in PubMed Google Scholar
Alok Nikhil Jha
View author publications
You can also search for this author in PubMed Google Scholar
Ajit Kumar Verma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sachin Kumar .

Editor information

Editors and Affiliations

Department of Computer and Electrical Technology, University of Stavanger, Stavanger, Norway
Mina Farmanbar
Department of Digital Industry Technologies, National and Kapodistrian University of Athens, Euboea, Greece
Maria Tzamtzi
Faculty of Engineering and Natural Sciences, Western Norway University of Applied Sciences, Haugesund, Norway
Ajit Kumar Verma
Department of Computer and Electrical Technology, University of Stavanger, Stavanger, Norway
Antorweep Chakravorty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumar, S., Bhagat, A.K., Erugurala, A., Mirza, A., Jha, A.N., Verma, A.K. (2024). A Novel Hybrid Model of Word Embedding and Deep Learning to Identify Hate and Abusive Content on Social Media Platform. In: Farmanbar, M., Tzamtzi, M., Verma, A.K., Chakravorty, A. (eds) Frontiers of Artificial Intelligence, Ethics, and Multidisciplinary Applications. FAIEMA 2023. Frontiers of Artificial Intelligence, Ethics and Multidisciplinary Applications. Springer, Singapore. https://doi.org/10.1007/978-981-99-9836-4_4

Download citation

DOI: https://doi.org/10.1007/978-981-99-9836-4_4
Published: 25 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-9835-7
Online ISBN: 978-981-99-9836-4
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics