Document Pre-processing

  • Chapter
  • First Online:
Sentiment Analysis in the Medical Domain
  • 230 Accesses

Abstract

Medical sentiment analysis can be considered as a two-step process comprising topic detection or health mention classification and the actual sentiment analysis. Health mention classification can be realised using topic detection methods such as topic modelling or named entity extraction. To be able to analyse expressed sentiments and their polarities, the text has to be pre-processed and relevant features have to be identified for classification. In this chapter, the different pre-processing tasks will be outlined and example methods to realise them will be presented. These include methods for text normalisation, feature extraction and feature selection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.nltk.org.

  2. 2.

    https://lhncbc.nlm.nih.gov/ii/tools/MetaMap.html.

References

  1. Aronson, A.R.: Metamap: Map** text to the umls metathesaurus. Bethesda, MD: NLM, NIH, DHHS 1, 26 (2006)

    Google Scholar 

  2. Bahja, M., Lycett, M.: Identifying patient experience from online resources via sentiment analysis and topic modelling. In: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, pp. 94–99 (2016). https://doi.org/10.1145/3006299.3006335

  3. Birjali, M., Kasri, M., Beni-Hssane, A.: A comprehensive survey on sentiment analysis: approaches, challenges and trends. Knowl. Based Syst. 226, 107134 (2021). https://doi.org/10.1016/j.knosys.2021.107134. https://www.sciencedirect.com/science/article/pii/S095070512100397X

  4. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  5. Chen, Q., Sokolova, M.: Word2vec and doc2vec in unsupervised sentiment analysis of clinical discharge summaries. CoRR abs/1805.00352 (2018). http://arxiv.org/abs/1805.00352

  6. de Albornoz, J.C., Vidal, J.R., Plaza, L.: Feature engineering for sentiment analysis in e-health forums. PLOS One 13(11), 1–25 (2018)

    Google Scholar 

  7. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019). https://doi.org/10.18653/v1/N19-1423. https://aclanthology.org/N19-1423

  8. Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, 2nd edn., [pearson international edition] edn. Prentice Hall Series in Artificial Intelligence. Prentice Hall, Pearson Education International, Englewood Cliffs, NJ (2009)

    Google Scholar 

  9. Madasu, A., Elango, S.: Efficient feature selection techniques for sentiment analysis. Multimed. Tools Appl. 79, 6313–6335 (2020)

    Article  Google Scholar 

  10. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, pp. 3111–3119. Curran Associates Inc., Red Hook, NY (2013)

    Google Scholar 

  11. Niu, Y., Zhu, X., Li, J., Hirst, G.: Analysis of polarity information in medical text. In: AMIA annual symposium proceedings, vol. 2005, p. 570. American Medical Informatics Association (2005)

    Google Scholar 

  12. Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 2227–2237. Association for Computational Linguistics, New Orleans, Louisiana (2018). https://doi.org/10.18653/v1/N18-1202. https://aclanthology.org/N18-1202

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Denecke, K. (2023). Document Pre-processing. In: Sentiment Analysis in the Medical Domain. Springer, Cham. https://doi.org/10.1007/978-3-031-30187-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-30187-2_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30186-5

  • Online ISBN: 978-3-031-30187-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation