Document Pre-processing

Denecke, Kerstin

doi:10.1007/978-3-031-30187-2_9

Kerstin Denecke²

230 Accesses

Abstract

Medical sentiment analysis can be considered as a two-step process comprising topic detection or health mention classification and the actual sentiment analysis. Health mention classification can be realised using topic detection methods such as topic modelling or named entity extraction. To be able to analyse expressed sentiments and their polarities, the text has to be pre-processed and relevant features have to be identified for classification. In this chapter, the different pre-processing tasks will be outlined and example methods to realise them will be presented. These include methods for text normalisation, feature extraction and feature selection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Aronson, A.R.: Metamap: Map** text to the umls metathesaurus. Bethesda, MD: NLM, NIH, DHHS 1, 26 (2006)
Google Scholar
Bahja, M., Lycett, M.: Identifying patient experience from online resources via sentiment analysis and topic modelling. In: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, pp. 94–99 (2016). https://doi.org/10.1145/3006299.3006335
Birjali, M., Kasri, M., Beni-Hssane, A.: A comprehensive survey on sentiment analysis: approaches, challenges and trends. Knowl. Based Syst. 226, 107134 (2021). https://doi.org/10.1016/j.knosys.2021.107134. https://www.sciencedirect.com/science/article/pii/S095070512100397X
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Chen, Q., Sokolova, M.: Word2vec and doc2vec in unsupervised sentiment analysis of clinical discharge summaries. CoRR abs/1805.00352 (2018). http://arxiv.org/abs/1805.00352
de Albornoz, J.C., Vidal, J.R., Plaza, L.: Feature engineering for sentiment analysis in e-health forums. PLOS One 13(11), 1–25 (2018)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019). https://doi.org/10.18653/v1/N19-1423. https://aclanthology.org/N19-1423
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, 2nd edn., [pearson international edition] edn. Prentice Hall Series in Artificial Intelligence. Prentice Hall, Pearson Education International, Englewood Cliffs, NJ (2009)
Google Scholar
Madasu, A., Elango, S.: Efficient feature selection techniques for sentiment analysis. Multimed. Tools Appl. 79, 6313–6335 (2020)
Article Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, pp. 3111–3119. Curran Associates Inc., Red Hook, NY (2013)
Google Scholar
Niu, Y., Zhu, X., Li, J., Hirst, G.: Analysis of polarity information in medical text. In: AMIA annual symposium proceedings, vol. 2005, p. 570. American Medical Informatics Association (2005)
Google Scholar
Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 2227–2237. Association for Computational Linguistics, New Orleans, Louisiana (2018). https://doi.org/10.18653/v1/N18-1202. https://aclanthology.org/N18-1202

Download references

Author information

Authors and Affiliations

Bern University of Applied Sciences, Biel, Switzerland
Kerstin Denecke

Authors

Kerstin Denecke
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Denecke, K. (2023). Document Pre-processing. In: Sentiment Analysis in the Medical Domain. Springer, Cham. https://doi.org/10.1007/978-3-031-30187-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-30187-2_9
Published: 24 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30186-5
Online ISBN: 978-3-031-30187-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics