Event Recognition in Laparoscopic Gynecology Videos with Hybrid Transformers

Nasirihaghighi, Sahar; Ghamsarian, Negin; Husslein, Heinrich; Schoeffmann, Klaus

doi:10.1007/978-3-031-56435-2_7

Sahar Nasirihaghighi¹⁴,
Negin Ghamsarian¹⁵,
Heinrich Husslein¹⁶ &
…
Klaus Schoeffmann¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14565))

Included in the following conference series:

International Conference on Multimedia Modeling

113 Accesses

Abstract

Analyzing laparoscopic surgery videos presents a complex and multifaceted challenge, with applications including surgical training, intra-operative surgical complication prediction, and post-operative surgical assessment. Identifying crucial events within these videos is a significant prerequisite in a majority of these applications. In this paper, we introduce a comprehensive dataset tailored for relevant event recognition in laparoscopic gynecology videos. Our dataset includes annotations for critical events associated with major intra-operative challenges and post-operative complications. To validate the precision of our annotations, we assess event recognition performance using several CNN-RNN architectures. Furthermore, we introduce and evaluate a hybrid transformer architecture coupled with a customized training-inference framework to recognize four specific events in laparoscopic surgery videos. Leveraging the Transformer networks, our proposed architecture harnesses inter-frame dependencies to counteract the adverse effects of relevant content occlusion, motion blur, and surgical scene variation, thus significantly enhancing event recognition accuracy. Moreover, we present a frame sampling strategy designed to manage variations in surgical scenes and the surgeons’ skill level, resulting in event recognition with high temporal resolution. We empirically demonstrate the superiority of our proposed methodology in event recognition compared to conventional CNN-RNN architectures through a series of extensive experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (Canada)

eBook: USD 39.99; Price excludes VAT (Canada)

Softcover Book: USD 54.99; Price excludes VAT (Canada)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aldahoul, N., Karim, H.A., Tan, M.J.T., Fermin, J.L.: Transfer learning and decision fusion for real time distortion classification in laparoscopic videos. IEEE Access 9, 115006–115018 (2021)
Article Google Scholar
Bello, I., Zoph, B., Vaswani, A., Shlens, J., Le, Q.V.: Attention augmented convolutional networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 3286–3295 (2019)
Google Scholar
Czempiel, T., et al.: TeCNO: surgical phase recognition with multi-stage temporal convolutional networks. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12263, pp. 343–352. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_33
Chapter Google Scholar
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. ar**v preprint ar**v:2010.11929 (2020)
Funke, I., Mees, S.T., Weitz, J., Speidel, S.: Video-based surgical skill assessment using 3D convolutional neural networks. Int. J. Comput. Assist. Radiol. Surg. 14, 1217–1225 (2019)
Article Google Scholar
Ghamsarian, N.: Enabling relevance-based exploration of cataract videos. In: Proceedings of the 2020 International Conference on Multimedia Retrieval, pp. 378–382 (2020)
Google Scholar
Ghamsarian, N., Amirpourazarian, H., Timmerer, C., Taschwer, M., Schöffmann, K.: Relevance-based compression of cataract surgery videos using convolutional neural networks. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 3577–3585 (2020)
Google Scholar
Ghamsarian, N., Taschwer, M., Putzgruber-Adamitsch, D., Sarny, S., El-Shabrawi, Y., Schoeffmann, K.: LensID: A CNN-RNN-based framework towards lens irregularity detection in cataract surgery videos. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12908, pp. 76–86. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87237-3_8
Chapter Google Scholar
Ghamsarian, N., Taschwer, M., Putzgruber-Adamitsch, D., Sarny, S., Schoeffmann, K.: Relevance detection in cataract surgery videos by spatio-temporal action localization. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10720–10727. IEEE (2021)
Google Scholar
Golany, T., et al.: Artificial intelligence for phase recognition in complex laparoscopic cholecystectomy. Surg. Endosc. 36(12), 9215–9223 (2022)
Article Google Scholar
He, Z., Mottaghi, A., Sharghi, A., Jamal, M.A., Mohareri, O.: An empirical study on activity recognition in long surgical videos. In: Machine Learning for Health, pp. 356–372. PMLR (2022)
Google Scholar
Huang, G.: Surgical action recognition and prediction with transformers. In: 2022 IEEE 2nd International Conference on Software Engineering and Artificial Intelligence (SEAI), pp. 36–40. IEEE (2022)
Google Scholar
**, Y., et al.: Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Med. Image Anal. 59, 101572 (2020)
Article Google Scholar
Kiyasseh, D., et al.: A vision transformer for decoding surgeon activity from surgical videos. Nat. Biomed. Eng. 7, 780–796 (2023)
Article Google Scholar
Leibetseder, A., Primus, M.J., Petscharnig, S., Schoeffmann, K.: Image-based smoke detection in laparoscopic videos. In: Cardoso, M.J., et al. (eds.) CARE/CLIP -2017. LNCS, vol. 10550, pp. 70–87. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67543-5_7
Chapter Google Scholar
Lim, S., Ghosh, S., Niklewski, P., Roy, S.: Laparoscopic suturing as a barrier to broader adoption of laparoscopic surgery. J. Soc. Laparoendosc. Surg. 21(3), e2017.00021 (2017)
Article Google Scholar
Loukas, C.: Video content analysis of surgical procedures. Surg. Endosc. 32, 553–568 (2018)
Article Google Scholar
Loukas, C., Georgiou, E.: Smoke detection in endoscopic surgery videos: a first step towards retrieval of semantic events. Int. J. Med. Robot. Comput. Assist. Surg. 11(1), 80–94 (2015)
Article Google Scholar
Loukas, C., Varytimidis, C., Rapantzikos, K., Kanakis, M.A.: Keyframe extraction from laparoscopic videos based on visual saliency detection. Comput. Methods Programs Biomed. 165, 13–23 (2018)
Article Google Scholar
Lux, M., Marques, O., Schöffmann, K., Böszörmenyi, L., Lajtai, G.: A novel tool for summarization of arthroscopic videos. Multimedia Tools Appl. 46, 521–544 (2010)
Article Google Scholar
Namazi, B., Sankaranarayanan, G., Devarajan, V.: Automatic detection of surgical phases in laparoscopic videos. In: Proceedings on the International Conference in Artificial Intelligence (ICAI), pp. 124–130 (2018)
Google Scholar
Namazi, B., Sankaranarayanan, G., Devarajan, V.: A contextual detector of surgical tools in laparoscopic videos using deep learning. Surg. Endosc. 36, 679–688 (2022)
Article Google Scholar
Nasirihaghighi, S., Ghamsarian, N., Stefanics, D., Schoeffmann, K., Husslein, H.: Action recognition in video recordings from gynecologic laparoscopy. In: 2023 IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS), pp. 29–34 (2023)
Google Scholar
Polat, M., Incebiyik, A., Tammo, O.: Abdominal access in laparoscopic surgery of obese patients: a novel abdominal access technique. Ann. Saudi Med. 43(4), 236–242 (2023)
Article Google Scholar
Schoeffmann, K., Del Fabro, M., Szkaliczki, T., Böszörmenyi, L., Keckstein, J.: Keyframe extraction in endoscopic video. Multimedia Tools Appl. 74, 11187–11206 (2015)
Article Google Scholar
Shi, C., Zheng, Y., Fey, A.M.: Recognition and prediction of surgical gestures and trajectories using transformer models in robot-assisted surgery. In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 8017–8024. IEEE (2022)
Google Scholar
Shi, P., Zhao, Z., Liu, K., Li, F.: Attention-based spatial-temporal neural network for accurate phase recognition in minimally invasive surgery: feasibility and efficiency verification. J. Comput. Des. Eng. 9(2), 406–416 (2022)
Google Scholar
Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., De Mathelin, M., Padoy, N.: EndoNet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans. Med. Imaging 36(1), 86–97 (2016)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Wang, C., Cheikh, F.A., Kaaniche, M., Elle, O.J.: A smoke removal method for laparoscopic images. ar**v preprint ar**v:1803.08410 (2018)

Download references

Acknowledgement

This work was funded by the FWF Austrian Science Fund under grant P 32010-N38. The authors would like to thank Daniela Stefanics for her help with data annotations.

Author information

Authors and Affiliations

Institute of Information Technology (ITEC), Klagenfurt University, Klagenfurt, Austria
Sahar Nasirihaghighi & Klaus Schoeffmann
Center for AI in Medicine, University of Bern, Bern, Switzerland
Negin Ghamsarian
Department of Gynecology and Gynecological Oncology, Medical University Vienna, Vienna, Austria
Heinrich Husslein

Authors

Sahar Nasirihaghighi
View author publications
You can also search for this author in PubMed Google Scholar
Negin Ghamsarian
View author publications
You can also search for this author in PubMed Google Scholar
Heinrich Husslein
View author publications
You can also search for this author in PubMed Google Scholar
Klaus Schoeffmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sahar Nasirihaghighi .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Stevan Rudinac
Delft University of Technology, Delft, The Netherlands
Alan Hanjalic
Delft University of Technology, Delft, The Netherlands
Cynthia Liem
University of Amsterdam, Amsterdam, The Netherlands
Marcel Worring
Reykjavik University, Reykjavik, Iceland
Björn Þór Jónsson
Microsoft Research Lab – Asia, Bei**g, China
Bei Liu
The University of Tokyo, Tokyo, Japan
Yoko Yamakata

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nasirihaghighi, S., Ghamsarian, N., Husslein, H., Schoeffmann, K. (2024). Event Recognition in Laparoscopic Gynecology Videos with Hybrid Transformers. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14565. Springer, Cham. https://doi.org/10.1007/978-3-031-56435-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-56435-2_7
Published: 20 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56434-5
Online ISBN: 978-3-031-56435-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Event Recognition in Laparoscopic Gynecology Videos with Hybrid Transformers