Event Recognition in Laparoscopic Gynecology Videos with Hybrid Transformers

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14565))

Included in the following conference series:

  • 113 Accesses

Abstract

Analyzing laparoscopic surgery videos presents a complex and multifaceted challenge, with applications including surgical training, intra-operative surgical complication prediction, and post-operative surgical assessment. Identifying crucial events within these videos is a significant prerequisite in a majority of these applications. In this paper, we introduce a comprehensive dataset tailored for relevant event recognition in laparoscopic gynecology videos. Our dataset includes annotations for critical events associated with major intra-operative challenges and post-operative complications. To validate the precision of our annotations, we assess event recognition performance using several CNN-RNN architectures. Furthermore, we introduce and evaluate a hybrid transformer architecture coupled with a customized training-inference framework to recognize four specific events in laparoscopic surgery videos. Leveraging the Transformer networks, our proposed architecture harnesses inter-frame dependencies to counteract the adverse effects of relevant content occlusion, motion blur, and surgical scene variation, thus significantly enhancing event recognition accuracy. Moreover, we present a frame sampling strategy designed to manage variations in surgical scenes and the surgeons’ skill level, resulting in event recognition with high temporal resolution. We empirically demonstrate the superiority of our proposed methodology in event recognition compared to conventional CNN-RNN architectures through a series of extensive experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (Canada)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aldahoul, N., Karim, H.A., Tan, M.J.T., Fermin, J.L.: Transfer learning and decision fusion for real time distortion classification in laparoscopic videos. IEEE Access 9, 115006–115018 (2021)

    Article  Google Scholar 

  2. Bello, I., Zoph, B., Vaswani, A., Shlens, J., Le, Q.V.: Attention augmented convolutional networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 3286–3295 (2019)

    Google Scholar 

  3. Czempiel, T., et al.: TeCNO: surgical phase recognition with multi-stage temporal convolutional networks. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12263, pp. 343–352. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_33

    Chapter  Google Scholar 

  4. Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. ar**v preprint ar**v:2010.11929 (2020)

  5. Funke, I., Mees, S.T., Weitz, J., Speidel, S.: Video-based surgical skill assessment using 3D convolutional neural networks. Int. J. Comput. Assist. Radiol. Surg. 14, 1217–1225 (2019)

    Article  Google Scholar 

  6. Ghamsarian, N.: Enabling relevance-based exploration of cataract videos. In: Proceedings of the 2020 International Conference on Multimedia Retrieval, pp. 378–382 (2020)

    Google Scholar 

  7. Ghamsarian, N., Amirpourazarian, H., Timmerer, C., Taschwer, M., Schöffmann, K.: Relevance-based compression of cataract surgery videos using convolutional neural networks. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 3577–3585 (2020)

    Google Scholar 

  8. Ghamsarian, N., Taschwer, M., Putzgruber-Adamitsch, D., Sarny, S., El-Shabrawi, Y., Schoeffmann, K.: LensID: A CNN-RNN-based framework towards lens irregularity detection in cataract surgery videos. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12908, pp. 76–86. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87237-3_8

    Chapter  Google Scholar 

  9. Ghamsarian, N., Taschwer, M., Putzgruber-Adamitsch, D., Sarny, S., Schoeffmann, K.: Relevance detection in cataract surgery videos by spatio-temporal action localization. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10720–10727. IEEE (2021)

    Google Scholar 

  10. Golany, T., et al.: Artificial intelligence for phase recognition in complex laparoscopic cholecystectomy. Surg. Endosc. 36(12), 9215–9223 (2022)

    Article  Google Scholar 

  11. He, Z., Mottaghi, A., Sharghi, A., Jamal, M.A., Mohareri, O.: An empirical study on activity recognition in long surgical videos. In: Machine Learning for Health, pp. 356–372. PMLR (2022)

    Google Scholar 

  12. Huang, G.: Surgical action recognition and prediction with transformers. In: 2022 IEEE 2nd International Conference on Software Engineering and Artificial Intelligence (SEAI), pp. 36–40. IEEE (2022)

    Google Scholar 

  13. **, Y., et al.: Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Med. Image Anal. 59, 101572 (2020)

    Article  Google Scholar 

  14. Kiyasseh, D., et al.: A vision transformer for decoding surgeon activity from surgical videos. Nat. Biomed. Eng. 7, 780–796 (2023)

    Article  Google Scholar 

  15. Leibetseder, A., Primus, M.J., Petscharnig, S., Schoeffmann, K.: Image-based smoke detection in laparoscopic videos. In: Cardoso, M.J., et al. (eds.) CARE/CLIP -2017. LNCS, vol. 10550, pp. 70–87. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67543-5_7

    Chapter  Google Scholar 

  16. Lim, S., Ghosh, S., Niklewski, P., Roy, S.: Laparoscopic suturing as a barrier to broader adoption of laparoscopic surgery. J. Soc. Laparoendosc. Surg. 21(3), e2017.00021 (2017)

    Article  Google Scholar 

  17. Loukas, C.: Video content analysis of surgical procedures. Surg. Endosc. 32, 553–568 (2018)

    Article  Google Scholar 

  18. Loukas, C., Georgiou, E.: Smoke detection in endoscopic surgery videos: a first step towards retrieval of semantic events. Int. J. Med. Robot. Comput. Assist. Surg. 11(1), 80–94 (2015)

    Article  Google Scholar 

  19. Loukas, C., Varytimidis, C., Rapantzikos, K., Kanakis, M.A.: Keyframe extraction from laparoscopic videos based on visual saliency detection. Comput. Methods Programs Biomed. 165, 13–23 (2018)

    Article  Google Scholar 

  20. Lux, M., Marques, O., Schöffmann, K., Böszörmenyi, L., Lajtai, G.: A novel tool for summarization of arthroscopic videos. Multimedia Tools Appl. 46, 521–544 (2010)

    Article  Google Scholar 

  21. Namazi, B., Sankaranarayanan, G., Devarajan, V.: Automatic detection of surgical phases in laparoscopic videos. In: Proceedings on the International Conference in Artificial Intelligence (ICAI), pp. 124–130 (2018)

    Google Scholar 

  22. Namazi, B., Sankaranarayanan, G., Devarajan, V.: A contextual detector of surgical tools in laparoscopic videos using deep learning. Surg. Endosc. 36, 679–688 (2022)

    Article  Google Scholar 

  23. Nasirihaghighi, S., Ghamsarian, N., Stefanics, D., Schoeffmann, K., Husslein, H.: Action recognition in video recordings from gynecologic laparoscopy. In: 2023 IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS), pp. 29–34 (2023)

    Google Scholar 

  24. Polat, M., Incebiyik, A., Tammo, O.: Abdominal access in laparoscopic surgery of obese patients: a novel abdominal access technique. Ann. Saudi Med. 43(4), 236–242 (2023)

    Article  Google Scholar 

  25. Schoeffmann, K., Del Fabro, M., Szkaliczki, T., Böszörmenyi, L., Keckstein, J.: Keyframe extraction in endoscopic video. Multimedia Tools Appl. 74, 11187–11206 (2015)

    Article  Google Scholar 

  26. Shi, C., Zheng, Y., Fey, A.M.: Recognition and prediction of surgical gestures and trajectories using transformer models in robot-assisted surgery. In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 8017–8024. IEEE (2022)

    Google Scholar 

  27. Shi, P., Zhao, Z., Liu, K., Li, F.: Attention-based spatial-temporal neural network for accurate phase recognition in minimally invasive surgery: feasibility and efficiency verification. J. Comput. Des. Eng. 9(2), 406–416 (2022)

    Google Scholar 

  28. Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., De Mathelin, M., Padoy, N.: EndoNet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans. Med. Imaging 36(1), 86–97 (2016)

    Article  Google Scholar 

  29. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  30. Wang, C., Cheikh, F.A., Kaaniche, M., Elle, O.J.: A smoke removal method for laparoscopic images. ar**v preprint ar**v:1803.08410 (2018)

Download references

Acknowledgement

This work was funded by the FWF Austrian Science Fund under grant P 32010-N38. The authors would like to thank Daniela Stefanics for her help with data annotations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sahar Nasirihaghighi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nasirihaghighi, S., Ghamsarian, N., Husslein, H., Schoeffmann, K. (2024). Event Recognition in Laparoscopic Gynecology Videos with Hybrid Transformers. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14565. Springer, Cham. https://doi.org/10.1007/978-3-031-56435-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-56435-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-56434-5

  • Online ISBN: 978-3-031-56435-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation