Abstract
This work focuses on the enrichment of existing Portuguese word embeddings with visual information. The combined text-image embeddings were tested against their text-only counterparts in common NLP tasks. The new embeddings were tested in two different domains - general news and a geosciences. The results show an increase in performance for several tasks, which indicates that visual information fusion for word embeddings can be useful for certain tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
References
Arora, S., May, A., Zhang, J., Ré, C.: Contextual embeddings: when are they worth it? In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2650–2663 (2020)
Bruni, E., Tran, N., Baroni, M.: Multimodal distributional semantics. J. Artif. Intell. Res. 49, 1–47 (2014)
Collell, G., Zhang, T., Moens, M.: Imagined visual representations as multimodal embeddings. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, pp. 4378–4384 (2017)
Consoli, B.S., Santos, J., Gomes, D., Cordeiro, F., Vieira, R., Moreira, V.: Embeddings for named entity recognition in geoscience Portuguese literature. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 4625–4630 (2020)
Consoli, B.S., Vieira, R.: Enriching Portuguese word embeddings with visual information. In: Britto, A., Valdivia Delgado, K. (eds.) BRACIS 2021. LNCS (LNAI), vol. 13074, pp. 434–448. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-91699-2_30
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 17th Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technologies, pp. 4171–4186 (2019)
Gomes, D.S.M., et al.: Portuguese word embeddings for the oil and gas industry: development and evaluation. Comput. Indus. 124, 1–44 (2021)
Hartmann, N., Fonseca, E.R., Shulby, C., Treviso, M.V., Rodrigues, J.S., Aluísio, S.M.: Portuguese word embeddings: evaluating on word analogies and natural language tasks. In: Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology, pp. 122–131 (2017)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the 1st International Conference on Learning Representations, p. 12 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 27th Annual Conference on Neural Information Processing Systems, pp. 3111–3119 (2013)
Paiva, V., Rademaker, A., Melo, G.: OpenWordNet-pt: an open Brazilian wordnet for reasoning. In: Proceedings of the 24th International Conference on Computational Linguistics, pp. 353–360 (2012)
Santos, J., Consoli, B.S., Vieira, R.: Word embedding evaluation in downstream tasks and semantic analogies. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 4828–4834 (2020)
Santos, J., Terra, J., Consoli, B.S., Vieira, R.: Multidomain contextual embeddings for named entity recognition. In: Proceedings of the Iberian Languages Evaluation Forum co-located with the 35th Conference of the Spanish Society for Natural Language Processing, pp. 434–441 (2019)
Silberer, C., Lapata, M.: Learning grounded meaning representations with autoencoders. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 721–732 (2014)
Acknowledgements
We would like to thank Petrobras and the Brazilian National Council for Scientific and Technological Development (CNPq) for their financial support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Consoli, B.S., Vieira, R. (2022). Enriching Portuguese Word Embeddings with Visual Information. In: Pinheiro, V., et al. Computational Processing of the Portuguese Language. PROPOR 2022. Lecture Notes in Computer Science(), vol 13208. Springer, Cham. https://doi.org/10.1007/978-3-030-98305-5_42
Download citation
DOI: https://doi.org/10.1007/978-3-030-98305-5_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98304-8
Online ISBN: 978-3-030-98305-5
eBook Packages: Computer ScienceComputer Science (R0)