Quora Question Pairs Using XG Boost

  • Conference paper
  • First Online:
Emerging Research in Computing, Information, Communication and Applications

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 790))

Abstract

Quora is a place to gain and share knowledge. It's a platform to ask questions and connect with people who contribute unique insights and quality answers. In September 2018, Quora reportedly hit 300 million monthly users. With over 300 million people visiting Quora every month, it’s no surprise that many people ask duplicated questions, that is, questions that have same intent. Multiple questions with the same intent can cause seekers to spend more time finding the best answer to their question and make writers feel they need to answer multiple versions of the same question. Quora values canonical questions because they provide a better experience to active seekers and writers, and offer more value to both of these groups in the long term. The main aim of this work is to apply various natural language processing (NLP) concepts for feature extraction from the given dataset and apply machine learning model XG Boost to predict the similarity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now
Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Broder A (1997) On the resemblance and containment of documents. In: Proceedings of the compression and complexity of sequences 1997, SEQUENCES’97, Washington, DC, USA. IEEE Computer Society

    Google Scholar 

  2. Kim Y (2014) Convolution neural networks for sentence classification. In: Proceedings of the 2015 Conference on empirical methods for natural language processing, Doha, Qatar, pp 1746–1751

    Google Scholar 

  3. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Proceedings of international conference on learning representations, ICLR 2013, Scottsdale, AZ, USA

    Google Scholar 

  4. Bogdanova D, dos Santos C, Barbosa L, Zadrozny B (2015). Detecting shingling MLP CNN LSTM LSTM + CNN Accuracy 0.6657 0.7263 0.8027 0.8107 0.8105 Precision 0.5151 0.5878 0.7102 0.6862 0.7004 Recall 0.7297 0.7245 0.7349 0.8441 0.7994 F1 0.6039 0.6490 0.7223 0.7570 0.7466 semantically equivalent questions in online user forums. In: Proceedings of the 19th conference on computational language learning, Bei**g, China, July 30–31, pp 123–131

    Google Scholar 

  5. Wang Z, Hamza W, Florian R (2017) Bilateral multi-perspective matching for natural language sentences. https://arxiv.org/abs/1702.03814

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. S. Soumya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chunamari, A., Yashas, M., Basu, A., Anirudh, D.K., Soumya, C.S. (2022). Quora Question Pairs Using XG Boost. In: Shetty, N.R., Patnaik, L.M., Nagaraj, H.C., Hamsavath, P.N., Nalini, N. (eds) Emerging Research in Computing, Information, Communication and Applications. Lecture Notes in Electrical Engineering, vol 790. Springer, Singapore. https://doi.org/10.1007/978-981-16-1342-5_55

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-1342-5_55

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-1341-8

  • Online ISBN: 978-981-16-1342-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Navigation