Abstract
Although capitalisation is an important feature for the Named Entity Recognition (NER) task, the NER input data is not always cased. Recent studies suggest two main methods of dealing with such inconsistency: truecasing and training a model on a modified dataset. Furthermore, while develo** virtual assistants there is often a need to support interaction in several languages. It has been shown that multilingual BERT can be successfully used for cross-lingual transfer, performing on datasets in various languages with scores comparable to those obtained with language-specific models. In this paper, we address the task of Named Entity Recognition on inconsistently capitalised data in English and Russian. We demonstrate that using multilingual BERT trained on a concatenation of original and lowered datasets is the most effective way to solve the task. Our model achieves the highest average result on CoNLL-2003 and Collection 3 datasets while being robust to missing casing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arkhipov, M., Trofimova, M., Kuratov, Y., Sorokin, A.: Tuning multilingual transformers for language-specific named entity recognition. In: Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pp. 89–93 (2019)
Bodapati, S., Yun, H., Al-Onaizan, Y.: Robustness to capitalization errors in named entity recognition. ar**v preprint ar**v:1911.05241 (2019)
Burtsev, M., et al.: Deeppavlov: Open-source library for dialogue systems. In: Proceedings of ACL 2018, System Demonstrations, pp. 122–127 (2018)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. ar**v preprint ar**v:1810.04805 (2018)
Gravano, A., Jansche, M., Bacchiani, M.: Restoring punctuation and capitalization in transcribed speech. In: 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4741–4744. IEEE (2009)
Konovalov, V., Gulyaev, P., Sorokin, A., Kuratov, Y., Burtsev, M.: Exploring the BERT cross-lingual transfer for reading comprehension. In: Computational Linguistics and Intellectual Technologies, pp. 445–453 (2020)
Kuratov, Y., Arkhipov, M.: Adaptation of deep bidirectional multilingual transformers for Russian language. ar**v preprint ar**v:1905.07213 (2019)
Mayhew, S., Tsygankova, T., Roth, D.: NER and POS when nothing is capitalized. ar**v preprint ar**v:1903.11222 (2019)
Mozharova, V., Loukachevitch, N.: Two-stage approach in Russian named entity recognition. In: 2016 International FRUCT Conference on Intelligence, Social Media and Web (ISMW FRUCT), pp. 1–6. IEEE (2016)
Peters, M.E., et al.: Deep contextualized word representations (2018). https://doi.org/10.48550/ARXIV.1802.05365, https://arxiv.org/abs/1802.05365
Pires, T., Schlinger, E., Garrette, D.: How multilingual is multilingual bert? ar**v preprint ar**v:1906.01502 (2019)
Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. ar**v preprint cs/0306050 (2003)
Starostin, A.S., et al.: FactRuEval 2016: evaluation of named entity recognition and fact extraction systems for Russian (2016)
Wang, W., Knight, K., Marcu, D.: Capitalizing machine translation. In: Proceedings of the Human Language Technology Conference of the NAACL, Main Conference, pp. 1–8 (2006)
Acknowledgments
This work was supported by a grant for research centers in the field of artificial intelligence, provided by the Analytical Center for the Government of the Russian Federation under the subsidy agreement (agreement identifier 000000D730321P5Q0002) and the agreement with the Moscow Institute of Physics and Technology dated November 1, 2021 No. 70-2021-00138.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chizhikova, A., Konovalov, V., Burtsev, M. (2023). Multilingual Case-Insensitive Named Entity Recognition. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y. (eds) Advances in Neural Computation, Machine Learning, and Cognitive Research VI. NEUROINFORMATICS 2022. Studies in Computational Intelligence, vol 1064. Springer, Cham. https://doi.org/10.1007/978-3-031-19032-2_46
Download citation
DOI: https://doi.org/10.1007/978-3-031-19032-2_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19031-5
Online ISBN: 978-3-031-19032-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)