Abstract
The continuous development of deep learning technology makes it widely used in various fields. In the medical scene, electronic voucher recognition is a very challenging task. Compared with traditional manual entry, the application of OCR and NLP technology can effectively improve work efficiency and reduce the training cost of business personnel. Using OCR and NLP technology to digitize and structure the information on these paper materials has gradually become a hot spot in the current industry.
Evaluation task 4 (OCR identification of electronic medical paper documents (ePaper)) of CHIP2022 [15, 16, 25] requires extracte 87 fields from the four types of medical voucher materials, including discharge summary, outpatient invoice, drug purchase invoice, and inpatient invoice. This task is very challenging because of the various types of materials, noise-contained data, and many categories of target fields.
To achieve the above goals, we propose a knowledge-based multi-modal and multi-architecture medical voucher information extraction method, namely TripleMIE, which includes I2SM: Image to sequence model, L-SPN: Large scale PLM-based span prediction net, MMIE: multi-modal information extraction model, etc. At the same time, a knowledge-based model integration module named KME is proposed to effectively integrate prior knowledge such as competition rules and material types with the model results. With the help of the above modules, we have achieved excellent results on the online official test data, which verifies the performance of the proposed method.(https://tianchi.aliyun.com/dataset/131815#4)
B. **a and S. Ma–Contribute equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chiron, G., Doucet, A., Coustaty, M., Moreux, J.P.: Icdar 2017 competition on post-ocr text correction. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1423–1428. IEEE (2017)
Ford, E., Carroll, J.A., Smith, H.E., Scott, D., Cassell, J.A.: Extracting information from the text of electronic medical records to improve case detection: a systematic review. J. Am. Med. Inf. Assoc. 23(5), 1007–1015 (2016)
Gu, Z., et al.: Xylayoutlm: towards layout-aware multimodal networks for visually-rich document understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4583–4592 (2022)
Guo, Z., Li, X., Huang, H., Guo, N., Li, Q.: Deep learning-based image segmentation on multimodal medical imaging. IEEE Trans. Radiat. Plasma Med. Sci. 3(2), 162–169 (2019)
Gurulingappa, H., Mateen-Rajpu, A., Toldo, L.: Extraction of potential adverse drug events from medical case reports. J. Biomed. Semant. 3(1), 1–10 (2012)
Hahn, U., Oleynik, M.: Medical information extraction in the age of deep learning. Yearbook Med. Inf. 29(01), 208–220 (2020)
Hallett, C.: Multi-modal presentation of medical histories. In: Proceedings of the 13th International Conference on Intelligent user Interfaces, pp. 80–89 (2008)
Huang, Y., Lv, T., Cui, L., Lu, Y., Wei, F.: Layoutlmv3: pre-training for document AI with unified text and image masking. In: Proceedings of the 30th ACM International Conference on Multimedia (2022)
Kim, G., et al.: OCR-free document understanding transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022. ECCV 2022. LNCS, vol. 13688, pp 498–517. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_29
Lewis, M., et al.: Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. ar**v preprint ar**v:1910.13461 (2019)
Li, M., et al.: Trocr: transformer-based optical character recognition with pre-trained models. ar**v preprint ar**v:2109.10282 (2021)
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11474–11481 (2020)
Liu, Y., et al.: Roberta: A robustly optimized bert pretraining approach (2019)
Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Liu, L., Chang, D., Z.X.E.A.: Information extraction of medical materials: an overview of the track of medical materials medocr. In: Health Information Processing: 8th China Conference, CHIP 2022, Hangzhou, China, Revised Selected Papers. Springer Nature Singapore, Singapore, 21–23 October 2022
Liu, L., Chang, D., Z.X.e.a.: Medocr: the dataset for extraction of optical character recognition elements for medical materials. J. Med. Inf. 43(12), 28–31 (2022)
Ruan, W., Appasani, N., Kim, K., Vincelli, J., Kim, H., Lee, W.S.: Pictorial visualization of EMR summary interface and medical information extraction of clinical notes. In: 2018 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA), pp. 1–6. IEEE (2018)
Sharma, K., Giannakos, M.: Multimodal data capabilities for learning: what can multimodal data tell us about learning? Br. J. Educ. Technol. 51(5), 1450–1484 (2020)
Tan, C., Qiu, W., Chen, M., Wang, R., Huang, F.: Boundary enhanced neural span classification for nested named entity recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 9016–9023 (2020)
Tang, G., et al.: Matchvie: exploiting match relevancy between entities for visual information extraction. ar**v preprint ar**v:2106.12940 (2021)
Thompson, P., McNaught, J., Ananiadou, S.: Customised ocr correction for historical medical text. In: 2015 Digital Heritage, vol. 1, pp. 35–42. IEEE (2015)
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)
Xu, Y., et al.: Layoutlmv2: multi-modal pre-training for visually-rich document understanding. ar**v preprint ar**v:2012.14740 (2020)
Xu, Y., et al.: Layoutxlm: multimodal pre-training for multilingual visually-rich document understanding. ar**v preprint ar**v:2104.08836 (2021)
Zong, H., Lei, J., L.Z.E.A.: Overview of technology evaluation dataset for medical multimodal information extraction. J. Med. Inf. 43(12), 2–5+22 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
**a, B. et al. (2023). TripleMIE: Multi-modal and Multi Architecture Information Extraction. In: Tang, B., et al. Health Information Processing. Evaluation Track Papers. CHIP 2022. Communications in Computer and Information Science, vol 1773. Springer, Singapore. https://doi.org/10.1007/978-981-99-4826-0_14
Download citation
DOI: https://doi.org/10.1007/978-981-99-4826-0_14
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4825-3
Online ISBN: 978-981-99-4826-0
eBook Packages: Computer ScienceComputer Science (R0)