TripleMIE: Multi-modal and Multi Architecture Information Extraction

  • Conference paper
  • First Online:
Health Information Processing. Evaluation Track Papers (CHIP 2022)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1773))

Included in the following conference series:

  • 288 Accesses

Abstract

The continuous development of deep learning technology makes it widely used in various fields. In the medical scene, electronic voucher recognition is a very challenging task. Compared with traditional manual entry, the application of OCR and NLP technology can effectively improve work efficiency and reduce the training cost of business personnel. Using OCR and NLP technology to digitize and structure the information on these paper materials has gradually become a hot spot in the current industry.

Evaluation task 4 (OCR identification of electronic medical paper documents (ePaper)) of CHIP2022 [15, 16, 25] requires extracte 87 fields from the four types of medical voucher materials, including discharge summary, outpatient invoice, drug purchase invoice, and inpatient invoice. This task is very challenging because of the various types of materials, noise-contained data, and many categories of target fields.

To achieve the above goals, we propose a knowledge-based multi-modal and multi-architecture medical voucher information extraction method, namely TripleMIE, which includes I2SM: Image to sequence model, L-SPN: Large scale PLM-based span prediction net, MMIE: multi-modal information extraction model, etc. At the same time, a knowledge-based model integration module named KME is proposed to effectively integrate prior knowledge such as competition rules and material types with the model results. With the help of the above modules, we have achieved excellent results on the online official test data, which verifies the performance of the proposed method.(https://tianchi.aliyun.com/dataset/131815#4)

B. **a and S. Ma–Contribute equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 64.19
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 80.24
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chiron, G., Doucet, A., Coustaty, M., Moreux, J.P.: Icdar 2017 competition on post-ocr text correction. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1423–1428. IEEE (2017)

    Google Scholar 

  2. Ford, E., Carroll, J.A., Smith, H.E., Scott, D., Cassell, J.A.: Extracting information from the text of electronic medical records to improve case detection: a systematic review. J. Am. Med. Inf. Assoc. 23(5), 1007–1015 (2016)

    Article  Google Scholar 

  3. Gu, Z., et al.: Xylayoutlm: towards layout-aware multimodal networks for visually-rich document understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4583–4592 (2022)

    Google Scholar 

  4. Guo, Z., Li, X., Huang, H., Guo, N., Li, Q.: Deep learning-based image segmentation on multimodal medical imaging. IEEE Trans. Radiat. Plasma Med. Sci. 3(2), 162–169 (2019)

    Article  Google Scholar 

  5. Gurulingappa, H., Mateen-Rajpu, A., Toldo, L.: Extraction of potential adverse drug events from medical case reports. J. Biomed. Semant. 3(1), 1–10 (2012)

    Article  Google Scholar 

  6. Hahn, U., Oleynik, M.: Medical information extraction in the age of deep learning. Yearbook Med. Inf. 29(01), 208–220 (2020)

    Article  Google Scholar 

  7. Hallett, C.: Multi-modal presentation of medical histories. In: Proceedings of the 13th International Conference on Intelligent user Interfaces, pp. 80–89 (2008)

    Google Scholar 

  8. Huang, Y., Lv, T., Cui, L., Lu, Y., Wei, F.: Layoutlmv3: pre-training for document AI with unified text and image masking. In: Proceedings of the 30th ACM International Conference on Multimedia (2022)

    Google Scholar 

  9. Kim, G., et al.: OCR-free document understanding transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022. ECCV 2022. LNCS, vol. 13688, pp 498–517. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_29

  10. Lewis, M., et al.: Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. ar**v preprint ar**v:1910.13461 (2019)

  11. Li, M., et al.: Trocr: transformer-based optical character recognition with pre-trained models. ar**v preprint ar**v:2109.10282 (2021)

  12. Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11474–11481 (2020)

    Google Scholar 

  13. Liu, Y., et al.: Roberta: A robustly optimized bert pretraining approach (2019)

    Google Scholar 

  14. Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)

    Google Scholar 

  15. Liu, L., Chang, D., Z.X.E.A.: Information extraction of medical materials: an overview of the track of medical materials medocr. In: Health Information Processing: 8th China Conference, CHIP 2022, Hangzhou, China, Revised Selected Papers. Springer Nature Singapore, Singapore, 21–23 October 2022

    Google Scholar 

  16. Liu, L., Chang, D., Z.X.e.a.: Medocr: the dataset for extraction of optical character recognition elements for medical materials. J. Med. Inf. 43(12), 28–31 (2022)

    Google Scholar 

  17. Ruan, W., Appasani, N., Kim, K., Vincelli, J., Kim, H., Lee, W.S.: Pictorial visualization of EMR summary interface and medical information extraction of clinical notes. In: 2018 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA), pp. 1–6. IEEE (2018)

    Google Scholar 

  18. Sharma, K., Giannakos, M.: Multimodal data capabilities for learning: what can multimodal data tell us about learning? Br. J. Educ. Technol. 51(5), 1450–1484 (2020)

    Article  Google Scholar 

  19. Tan, C., Qiu, W., Chen, M., Wang, R., Huang, F.: Boundary enhanced neural span classification for nested named entity recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 9016–9023 (2020)

    Google Scholar 

  20. Tang, G., et al.: Matchvie: exploiting match relevancy between entities for visual information extraction. ar**v preprint ar**v:2106.12940 (2021)

  21. Thompson, P., McNaught, J., Ananiadou, S.: Customised ocr correction for historical medical text. In: 2015 Digital Heritage, vol. 1, pp. 35–42. IEEE (2015)

    Google Scholar 

  22. Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)

    Google Scholar 

  23. Xu, Y., et al.: Layoutlmv2: multi-modal pre-training for visually-rich document understanding. ar**v preprint ar**v:2012.14740 (2020)

  24. Xu, Y., et al.: Layoutxlm: multimodal pre-training for multilingual visually-rich document understanding. ar**v preprint ar**v:2104.08836 (2021)

  25. Zong, H., Lei, J., L.Z.E.A.: Overview of technology evaluation dataset for medical multimodal information extraction. J. Med. Inf. 43(12), 2–5+22 (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongbin Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

**a, B. et al. (2023). TripleMIE: Multi-modal and Multi Architecture Information Extraction. In: Tang, B., et al. Health Information Processing. Evaluation Track Papers. CHIP 2022. Communications in Computer and Information Science, vol 1773. Springer, Singapore. https://doi.org/10.1007/978-981-99-4826-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-4826-0_14

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-4825-3

  • Online ISBN: 978-981-99-4826-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation