Abstract
Quora is a place to gain and share knowledge. It's a platform to ask questions and connect with people who contribute unique insights and quality answers. In September 2018, Quora reportedly hit 300 million monthly users. With over 300 million people visiting Quora every month, it’s no surprise that many people ask duplicated questions, that is, questions that have same intent. Multiple questions with the same intent can cause seekers to spend more time finding the best answer to their question and make writers feel they need to answer multiple versions of the same question. Quora values canonical questions because they provide a better experience to active seekers and writers, and offer more value to both of these groups in the long term. The main aim of this work is to apply various natural language processing (NLP) concepts for feature extraction from the given dataset and apply machine learning model XG Boost to predict the similarity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Broder A (1997) On the resemblance and containment of documents. In: Proceedings of the compression and complexity of sequences 1997, SEQUENCES’97, Washington, DC, USA. IEEE Computer Society
Kim Y (2014) Convolution neural networks for sentence classification. In: Proceedings of the 2015 Conference on empirical methods for natural language processing, Doha, Qatar, pp 1746–1751
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Proceedings of international conference on learning representations, ICLR 2013, Scottsdale, AZ, USA
Bogdanova D, dos Santos C, Barbosa L, Zadrozny B (2015). Detecting shingling MLP CNN LSTM LSTM + CNN Accuracy 0.6657 0.7263 0.8027 0.8107 0.8105 Precision 0.5151 0.5878 0.7102 0.6862 0.7004 Recall 0.7297 0.7245 0.7349 0.8441 0.7994 F1 0.6039 0.6490 0.7223 0.7570 0.7466 semantically equivalent questions in online user forums. In: Proceedings of the 19th conference on computational language learning, Bei**g, China, July 30–31, pp 123–131
Wang Z, Hamza W, Florian R (2017) Bilateral multi-perspective matching for natural language sentences. https://arxiv.org/abs/1702.03814
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chunamari, A., Yashas, M., Basu, A., Anirudh, D.K., Soumya, C.S. (2022). Quora Question Pairs Using XG Boost. In: Shetty, N.R., Patnaik, L.M., Nagaraj, H.C., Hamsavath, P.N., Nalini, N. (eds) Emerging Research in Computing, Information, Communication and Applications. Lecture Notes in Electrical Engineering, vol 790. Springer, Singapore. https://doi.org/10.1007/978-981-16-1342-5_55
Download citation
DOI: https://doi.org/10.1007/978-981-16-1342-5_55
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1341-8
Online ISBN: 978-981-16-1342-5
eBook Packages: EngineeringEngineering (R0)