Abstract
Medical sentiment analysis can be considered as a two-step process comprising topic detection or health mention classification and the actual sentiment analysis. Health mention classification can be realised using topic detection methods such as topic modelling or named entity extraction. To be able to analyse expressed sentiments and their polarities, the text has to be pre-processed and relevant features have to be identified for classification. In this chapter, the different pre-processing tasks will be outlined and example methods to realise them will be presented. These include methods for text normalisation, feature extraction and feature selection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aronson, A.R.: Metamap: Map** text to the umls metathesaurus. Bethesda, MD: NLM, NIH, DHHS 1, 26 (2006)
Bahja, M., Lycett, M.: Identifying patient experience from online resources via sentiment analysis and topic modelling. In: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, pp. 94–99 (2016). https://doi.org/10.1145/3006299.3006335
Birjali, M., Kasri, M., Beni-Hssane, A.: A comprehensive survey on sentiment analysis: approaches, challenges and trends. Knowl. Based Syst. 226, 107134 (2021). https://doi.org/10.1016/j.knosys.2021.107134. https://www.sciencedirect.com/science/article/pii/S095070512100397X
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Chen, Q., Sokolova, M.: Word2vec and doc2vec in unsupervised sentiment analysis of clinical discharge summaries. CoRR abs/1805.00352 (2018). http://arxiv.org/abs/1805.00352
de Albornoz, J.C., Vidal, J.R., Plaza, L.: Feature engineering for sentiment analysis in e-health forums. PLOS One 13(11), 1–25 (2018)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019). https://doi.org/10.18653/v1/N19-1423. https://aclanthology.org/N19-1423
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, 2nd edn., [pearson international edition] edn. Prentice Hall Series in Artificial Intelligence. Prentice Hall, Pearson Education International, Englewood Cliffs, NJ (2009)
Madasu, A., Elango, S.: Efficient feature selection techniques for sentiment analysis. Multimed. Tools Appl. 79, 6313–6335 (2020)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, pp. 3111–3119. Curran Associates Inc., Red Hook, NY (2013)
Niu, Y., Zhu, X., Li, J., Hirst, G.: Analysis of polarity information in medical text. In: AMIA annual symposium proceedings, vol. 2005, p. 570. American Medical Informatics Association (2005)
Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 2227–2237. Association for Computational Linguistics, New Orleans, Louisiana (2018). https://doi.org/10.18653/v1/N18-1202. https://aclanthology.org/N18-1202
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Denecke, K. (2023). Document Pre-processing. In: Sentiment Analysis in the Medical Domain. Springer, Cham. https://doi.org/10.1007/978-3-031-30187-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-30187-2_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30186-5
Online ISBN: 978-3-031-30187-2
eBook Packages: Computer ScienceComputer Science (R0)