TPNet: Enhancing Weakly Supervised Polyp Frame Detection with Temporal Encoder and Prototype-Based Memory Bank

Gao, Jianzhe; Luo, Zhiming; Tian, Cheng; Li, Shaozi

doi:10.1007/978-981-99-8555-5_37

Jianzhe Gao¹⁵,
Zhiming Luo¹⁵,
Cheng Tian¹⁵ &
…
Shaozi Li¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14436))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

517 Accesses

Abstract

Polyp detection plays a crucial role in the early prevention of colorectal cancer. The availability of large-scale polyp video datasets and video-level annotations has spurred research efforts to formulate polyp detection as a weakly-supervised anomaly detection task, which leverages video-level labeled training data for detecting frame-level polyps. However, few studies have investigated the impact of specific properties within polyp videos, including temporal dynamics, ambiguity, and complex noise. In this work, we propose TPNet, a novel framework that addresses several challenges posed by colonoscopy videos, for weakly-supervised polyp frame detection. Specifically, we design a Temporal Encoder that effectively capturing the temporal dynamics and intricate patterns within polyp video segments to foster accuracy. Additionally, we introduce a Prototype-based Memory Bank that facilitates the storage and retrieval of significant discriminative information, which enhance the sensitivity and robustness in ambiguous and complicated conditions. Experiments conducted on one of the largest and most challenging colonoscopy datasets demonstrate that our proposed TPNet achieves state-of-the-art performance, surpassing the latest cutting-edge method with 6.19% in average precision (AP).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Contrastive Transformer-Based Multiple Instance Learning for Weakly Supervised Polyp Frame Detection

LDPolypVideo Benchmark: A Large-Scale Colonoscopy Video Dataset of Diverse Polyps

Semi-supervised Spatial Temporal Attention Network for Video Polyp Segmentation

References

Ahn, S.B., Han, D.S., Bae, J.H., Byun, T.J., Kim, J.P., Eun, C.S.: The miss rate for colorectal adenoma determined by quality-adjusted, back-to-back colonoscopies. Gut Liver 6(1), 64 (2012)
Article Google Scholar
Ali, S., Dmitrieva, M., Ghatwary, N., Bano, S., Polat, G., Temizel, A., Krenzer, A., Hekalo, A., Guo, Y.B., Matuszewski, B., et al.: Deep learning for detection and segmentation of artefact and disease instances in gastrointestinal endoscopy. Med. Image Anal. 70, 102002 (2021)
Article Google Scholar
Borgli, H., et al.: Hyperkvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Sci. Data 7(1), 283 (2020)
Article Google Scholar
Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. In: CVPR, pp. 6299–6308 (2017)
Google Scholar
Feng, J.C., Hong, F.T., Zheng, W.S.: Mist: multiple instance self-training framework for video anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14009–14018 (2021)
Google Scholar
Gong, D., et al.: Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1705–1714 (2019)
Google Scholar
Itoh, H., Misawa, M., Mori, Y., Kudo, S.E., Oda, M., Mori, K.: Positive-gradient-weighted object activation map**: visual explanation of object detector towards precise colorectal-polyp localisation. Int. J. Comput. Assist. Radiol. Surg. 17(11), 2051–2063 (2022)
Article Google Scholar
Ji, G.-P., et al.: Progressively normalized self-attention network for video polyp segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 142–152. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_14
Chapter Google Scholar
Ji, G.P., et al.: Video polyp segmentation: a deep learning perspective. Mach. Intell. Res. 19, 1–19 (2022). https://doi.org/10.1007/s11633-022-1371-y
Article Google Scholar
Kim, Y., Kim, M., Kim, G.: Memorization precedes generation: learning unsupervised GANs with memory networks. ar**v preprint ar**v:1803.01500 (2018)
Ladabaum, U., Dominitz, J.A., Kahi, C., Schoen, R.E.: Strategies for colorectal cancer screening. Gastroenterology 158(2), 418–432 (2020)
Article Google Scholar
Leufkens, A., Van Oijen, M., Vleggaar, F., Siersema, P.: Factors influencing the miss rate of polyps in a back-to-back colonoscopy study. Endoscopy 44(05), 470–475 (2012)
Article Google Scholar
Liu, Z., Nie, Y., Long, C., Zhang, Q., Li, G.: A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13588–13597 (2021)
Google Scholar
Ma, Y., Chen, X., Cheng, K., Li, Y., Sun, B.: LDPolypVideo benchmark: a large-scale colonoscopy video dataset of diverse polyps. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12905, pp. 387–396. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87240-3_37
Chapter Google Scholar
Mathur, P., et al.: Cancer statistics, 2020: report from national cancer registry programme, India. JCO Glob. Oncol. 6, 1063–1075 (2020)
Article Google Scholar
Misawa, M., et al.: Development of a computer-aided detection system for colonoscopy and a publicly accessible large colonoscopy video database (with video). Gastrointest. Endosc. 93(4), 960–967 (2021)
Google Scholar
Park, H., Noh, J., Ham, B.: Learning memory-guided normality for anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14372–14381 (2020)
Google Scholar
Podlasek, J., Heesch, M., Podlasek, R., Kilisiński, W., Filip, R.: Real-time deep learning-based colorectal polyp localization on clinical video footage achievable with a wide array of hardware configurations. Endosc. Int. Open 9(05), E741–E748 (2021)
Article Google Scholar
Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.: Meta-learning with memory-augmented neural networks. In: International Conference on Machine Learning, pp. 1842–1850. PMLR (2016)
Google Scholar
Sultani, W., Chen, C., Shah, M.: Real-world anomaly detection in surveillance videos. In: CVPR, pp. 6479–6488 (2018)
Google Scholar
Tian, Y., Pang, G., Chen, Y., Singh, R., Verjans, J.W., Carneiro, G.: Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4975–4986 (2021)
Google Scholar
Tian, Y., et al.: Contrastive transformer-based multiple instance learning for weakly supervised polyp frame detection. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) Medical Image Computing and Computer Assisted Intervention - MICCAI 2022. MICCAI 2022. LNCS, vol. 13433, pp. 88–98. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16437-8_9
Wan, B., Fang, Y., **a, X., Mei, J.: Weakly supervised video anomaly detection via center-guided discriminative learning. In: 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2020)
Google Scholar
Wu, L., Hu, Z., Ji, Y., Luo, P., Zhang, S.: Multi-frame collaboration for effective endoscopic video polyp detection via spatial-temporal feature transformation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12905, pp. 302–312. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87240-3_29
Chapter Google Scholar
Xu, J., Zhao, R., Yu, Y., Zhang, Q., Bian, X., Wang, J., Ge, Z., Qian, D.: Real-time automatic polyp detection in colonoscopy using feature enhancement module and spatiotemporal similarity correlation unit. Biomed. Signal Process. Control 66, 102503 (2021)
Article Google Scholar
Zaheer, M.Z., Mahmood, A., Astrid, M., Lee, S.-I.: CLAWS: clustering assisted weakly supervised learning with normalcy suppression for anomalous event detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020, Part XXII. LNCS, vol. 12367, pp. 358–376. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_22
Chapter Google Scholar
Zhao, X., et al.: Semi-supervised spatial temporal attention network for video polyp segmentation. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) Medical Image Computing and Computer Assisted Intervention - MICCAI 2022. MICCAI 2022. LNCS, vol. 13434, pp. 456–466 Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16440-8_44
Zhong, J.X., Li, N., Kong, W., Liu, S., Li, T.H., Li, G.: Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1237–1246 (2019)
Google Scholar

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (No. 62276221), the Natural Science Foundation of Fujian Province of China (No. 2022J01002), and the Science and Technology Plan Project of **amen (No. 3502Z20221025).

Author information

Authors and Affiliations

Department of Artificial Intelligence, **amen University, **amen, Fujian, China
Jianzhe Gao, Zhiming Luo, Cheng Tian & Shaozi Li

Authors

Jianzhe Gao
View author publications
You can also search for this author in PubMed Google Scholar
Zhiming Luo
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Tian
View author publications
You can also search for this author in PubMed Google Scholar
Shaozi Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiming Luo .

Editor information

Editors and Affiliations

Nan**g University of Information Science and Technology, Nan**g, China
Qingshan Liu
**amen University, **amen, China
Hanzi Wang
Bei**g University of Posts and Telecommunications, Bei**g, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Bei**g, China
Hongbin Zha
Chinese Academy of Sciences, Bei**g, China
**lin Chen
Chinese Academy of Sciences, Bei**g, China
Liang Wang
**amen University, **amen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, J., Luo, Z., Tian, C., Li, S. (2024). TPNet: Enhancing Weakly Supervised Polyp Frame Detection with Temporal Encoder and Prototype-Based Memory Bank. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14436. Springer, Singapore. https://doi.org/10.1007/978-981-99-8555-5_37

Download citation

DOI: https://doi.org/10.1007/978-981-99-8555-5_37
Published: 28 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8554-8
Online ISBN: 978-981-99-8555-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

TPNet: Enhancing Weakly Supervised Polyp Frame Detection with Temporal Encoder and Prototype-Based Memory Bank

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Contrastive Transformer-Based Multiple Instance Learning for Weakly Supervised Polyp Frame Detection

LDPolypVideo Benchmark: A Large-Scale Colonoscopy Video Dataset of Diverse Polyps

Semi-supervised Spatial Temporal Attention Network for Video Polyp Segmentation

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

TPNet: Enhancing Weakly Supervised Polyp Frame Detection with Temporal Encoder and Prototype-Based Memory Bank

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Contrastive Transformer-Based Multiple Instance Learning for Weakly Supervised Polyp Frame Detection

LDPolypVideo Benchmark: A Large-Scale Colonoscopy Video Dataset of Diverse Polyps

Semi-supervised Spatial Temporal Attention Network for Video Polyp Segmentation

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation