TripleMIE: Multi-modal and Multi Architecture Information Extraction

**a, Boqian; Ma, Shihan; Li, Yadong; Huang, Wenkang; Shi, Qiuhui; Huang, Zuming; **e, Lele; Wang, Hongbin

doi:10.1007/978-981-99-4826-0_14

Boqian **a¹⁶,
Shihan Ma¹⁶,
Yadong Li¹⁶,
Wenkang Huang¹⁶,
Qiuhui Shi¹⁶,
Zuming Huang¹⁶,
Lele **e¹⁶ &
…
Hongbin Wang¹⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1773))

Included in the following conference series:

China Health Information Processing Conference

288 Accesses

Abstract

The continuous development of deep learning technology makes it widely used in various fields. In the medical scene, electronic voucher recognition is a very challenging task. Compared with traditional manual entry, the application of OCR and NLP technology can effectively improve work efficiency and reduce the training cost of business personnel. Using OCR and NLP technology to digitize and structure the information on these paper materials has gradually become a hot spot in the current industry.

Evaluation task 4 (OCR identification of electronic medical paper documents (ePaper)) of CHIP2022 [15, 16, 25] requires extracte 87 fields from the four types of medical voucher materials, including discharge summary, outpatient invoice, drug purchase invoice, and inpatient invoice. This task is very challenging because of the various types of materials, noise-contained data, and many categories of target fields.

To achieve the above goals, we propose a knowledge-based multi-modal and multi-architecture medical voucher information extraction method, namely TripleMIE, which includes I2SM: Image to sequence model, L-SPN: Large scale PLM-based span prediction net, MMIE: multi-modal information extraction model, etc. At the same time, a knowledge-based model integration module named KME is proposed to effectively integrate prior knowledge such as competition rules and material types with the model results. With the help of the above modules, we have achieved excellent results on the online official test data, which verifies the performance of the proposed method.(https://tianchi.aliyun.com/dataset/131815#4)

B. **a and S. Ma–Contribute equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 64.19; Price includes VAT (Germany)

Softcover Book: EUR 80.24; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chiron, G., Doucet, A., Coustaty, M., Moreux, J.P.: Icdar 2017 competition on post-ocr text correction. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1423–1428. IEEE (2017)
Google Scholar
Ford, E., Carroll, J.A., Smith, H.E., Scott, D., Cassell, J.A.: Extracting information from the text of electronic medical records to improve case detection: a systematic review. J. Am. Med. Inf. Assoc. 23(5), 1007–1015 (2016)
Article Google Scholar
Gu, Z., et al.: Xylayoutlm: towards layout-aware multimodal networks for visually-rich document understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4583–4592 (2022)
Google Scholar
Guo, Z., Li, X., Huang, H., Guo, N., Li, Q.: Deep learning-based image segmentation on multimodal medical imaging. IEEE Trans. Radiat. Plasma Med. Sci. 3(2), 162–169 (2019)
Article Google Scholar
Gurulingappa, H., Mateen-Rajpu, A., Toldo, L.: Extraction of potential adverse drug events from medical case reports. J. Biomed. Semant. 3(1), 1–10 (2012)
Article Google Scholar
Hahn, U., Oleynik, M.: Medical information extraction in the age of deep learning. Yearbook Med. Inf. 29(01), 208–220 (2020)
Article Google Scholar
Hallett, C.: Multi-modal presentation of medical histories. In: Proceedings of the 13th International Conference on Intelligent user Interfaces, pp. 80–89 (2008)
Google Scholar
Huang, Y., Lv, T., Cui, L., Lu, Y., Wei, F.: Layoutlmv3: pre-training for document AI with unified text and image masking. In: Proceedings of the 30th ACM International Conference on Multimedia (2022)
Google Scholar
Kim, G., et al.: OCR-free document understanding transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022. ECCV 2022. LNCS, vol. 13688, pp 498–517. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_29
Lewis, M., et al.: Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. ar**v preprint ar**v:1910.13461 (2019)
Li, M., et al.: Trocr: transformer-based optical character recognition with pre-trained models. ar**v preprint ar**v:2109.10282 (2021)
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11474–11481 (2020)
Google Scholar
Liu, Y., et al.: Roberta: A robustly optimized bert pretraining approach (2019)
Google Scholar
Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Liu, L., Chang, D., Z.X.E.A.: Information extraction of medical materials: an overview of the track of medical materials medocr. In: Health Information Processing: 8th China Conference, CHIP 2022, Hangzhou, China, Revised Selected Papers. Springer Nature Singapore, Singapore, 21–23 October 2022
Google Scholar
Liu, L., Chang, D., Z.X.e.a.: Medocr: the dataset for extraction of optical character recognition elements for medical materials. J. Med. Inf. 43(12), 28–31 (2022)
Google Scholar
Ruan, W., Appasani, N., Kim, K., Vincelli, J., Kim, H., Lee, W.S.: Pictorial visualization of EMR summary interface and medical information extraction of clinical notes. In: 2018 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA), pp. 1–6. IEEE (2018)
Google Scholar
Sharma, K., Giannakos, M.: Multimodal data capabilities for learning: what can multimodal data tell us about learning? Br. J. Educ. Technol. 51(5), 1450–1484 (2020)
Article Google Scholar
Tan, C., Qiu, W., Chen, M., Wang, R., Huang, F.: Boundary enhanced neural span classification for nested named entity recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 9016–9023 (2020)
Google Scholar
Tang, G., et al.: Matchvie: exploiting match relevancy between entities for visual information extraction. ar**v preprint ar**v:2106.12940 (2021)
Thompson, P., McNaught, J., Ananiadou, S.: Customised ocr correction for historical medical text. In: 2015 Digital Heritage, vol. 1, pp. 35–42. IEEE (2015)
Google Scholar
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)
Google Scholar
Xu, Y., et al.: Layoutlmv2: multi-modal pre-training for visually-rich document understanding. ar**v preprint ar**v:2012.14740 (2020)
Xu, Y., et al.: Layoutxlm: multimodal pre-training for multilingual visually-rich document understanding. ar**v preprint ar**v:2104.08836 (2021)
Zong, H., Lei, J., L.Z.E.A.: Overview of technology evaluation dataset for medical multimodal information extraction. J. Med. Inf. 43(12), 2–5+22 (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

AntGroup, Shanghai, 200001, China
Boqian **a, Shihan Ma, Yadong Li, Wenkang Huang, Qiuhui Shi, Zuming Huang, Lele **e & Hongbin Wang

Authors

Boqian **a
View author publications
You can also search for this author in PubMed Google Scholar
Shihan Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yadong Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenkang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Qiuhui Shi
View author publications
You can also search for this author in PubMed Google Scholar
Zuming Huang
View author publications
You can also search for this author in PubMed Google Scholar
Lele **e
View author publications
You can also search for this author in PubMed Google Scholar
Hongbin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongbin Wang .

Editor information

Editors and Affiliations

Harbin Institute of Technology, Shenzhen, China
Buzhou Tang
Harbin Institute of Technology, Shenzhen, China
Qingcai Chen
Dalian University of Technology, Dalian, China
Hongfei Lin
Zhejiang University, Hangzhou, Zhejiang, China
Fei Wu
Fudan University, Shanghai, China
Lei Liu
South China Normal University, Guangzhou, China
Tianyong Hao
University of Pittsburgh, Pittsburgh, PA, USA
Yanshan Wang
The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
Haitian Wang
Medical Informatics Center of Peking University, Bei**g, China
Jianbo Lei
Takeda Co. Ltd., Shanghai, China
Zuofeng Li
West China Hospital, Chengdu, China
Hui Zong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

**a, B. et al. (2023). TripleMIE: Multi-modal and Multi Architecture Information Extraction. In: Tang, B., et al. Health Information Processing. Evaluation Track Papers. CHIP 2022. Communications in Computer and Information Science, vol 1773. Springer, Singapore. https://doi.org/10.1007/978-981-99-4826-0_14

Download citation

DOI: https://doi.org/10.1007/978-981-99-4826-0_14
Published: 22 July 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4825-3
Online ISBN: 978-981-99-4826-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

TripleMIE: Multi-modal and Multi Architecture Information Extraction