Log in

Weakly-supervised video anomaly detection via temporal resolution feature learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Weakly supervised video anomaly detection (WS-VAD) is often formulated as a multiple instance learning (MIL) problem. Snippet-level anomaly scores can be predicted using only video-level annotations, but most MIL approaches focus on improving the performance of the feature learning network and ignore the method design of the preprocessing stage. MIL-based methods usually preprocess videos of different lengths into a predefined number of snippets for later anomaly identification. This is impractical for real-world videos of varying lengths when the duration of anomalous events is unknown in training. Data with different temporal resolutions generated by this division confuses the network and leads to limited detection capability. To address this issue, we propose a novel WS-VAD method. First, a temporal resolution feature map** module (TRFM) improves the network’s learning ability for input data with different temporal resolutions by map** the temporal resolution information into the feature learning space. We also introduce a gated recurrent unit (GRU)-based multi-scale temporal feature learning module (MS-GRU), combining GRUs with multi-scale convolutional structures and fusing features recursively at different time scales. This module exploits the ability of GRUs to extract temporal information and compensates for the fact that GRUs only extract single-scale temporal dependence. In addition, we propose the Adaptive-k module to optimize the original Top-k loss and increase flexibility in training by using the optimal number of anomalous segments k generated according to the different inputs. This approach is fully applicable to real-world videos of various lengths. Experimental results show that our model boosts the detection accuracy for data with enormous differences in temporal resolution and obtains state-of-the-art frame-level AUC performance on three real-world surveillance datasets: UCF-Crime, ShanghaiTech and XD-violence datasets.

Graphic abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Canada)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability

The UCF-crime datasets are available at https://www.crcv.ucf.edu/projects/real-world/. The ShangHaiTech datasets are available at https://svip-lab.github.io/dataset/campus_dataset.html. The XD-Violence datasets are available at https://roc-ng.github.io/XD-Violence/.

References

  1. Zaigham Zaheer M, Lee J-H, Astrid M, Lee S-I (2020) Old is gold: redefining the adversarially learned one-class classifier training paradigm. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14171–14181. https://doi.org/10.1109/CVPR42600.2020.01419

  2. Liu Z, Nie Y, Long C, Zhang Q, Li G (2021) A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 13588–13597

  3. Wen L, Weixin L, Dongze L, Shenghua G (2018) Future frame prediction for anomaly detection - a new baseline. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6536–6545. https://doi.org/10.1109/CVPR.2018.00684

  4. Yu J, Lee Y, Yow KC, Jeon M, Pedrycz W (2021) Abnormal event detection and localization via adversarial event prediction. IEEE Transactions on neural networks and learning systems, 1–15

  5. Tian Y, Pang G, Chen Y, Singh R, Verjans JW, Carneiro G (2021) Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 4975–4986

  6. Feng J-C, Hong F-T, Zheng W-S (2021) Mist: Multiple instance self-training framework for video anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14009–14018

  7. Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 6479–6488. https://doi.org/10.1109/CVPR.2018.00678

  8. Yu G, Wang S, Cai Z, Zhu E, Xu C, Yin J, Kloft M (2020) Cloze test helps: Effective video anomaly detection via learning to complete video events. In: Proceedings of the 28th ACM international conference on multimedia, pp 583–591

  9. Zhong J-X, Li N, Kong W, Liu S, Li TH, Li G (2019) Graph convolutional label noise cleaner: train a plug-and-play action classifier for anomaly detection. In: Proceedins of the IEEE/CVF conference on computer vision and pattern recognition, pp 1237–1246. https://doi.org/10.1109/CVPR.2019.00133

  10. Wu P, Liu J, Shi Y, Sun Y, Shao F, Wu Z, Yang Z (2020) Not only look, but also listen: Learning multimodal violence detection under weak supervision. In: European conference on computer vision, springer pp 322–339

  11. Wan B, Fang Y, **a X, Mei J (2020) Weakly supervised video anomaly detection via center-guided discriminative learning. In: 2020 IEEE International conference on multimedia and expo (ICME), IEEE pp 1–6

  12. Zhang J, Qing L, Miao J (2019) Temporal convolutional network with complementary inner bag loss for weakly supervised anomaly detection. In: 2019 IEEE International conference on image processing (ICIP), IEEE pp 4030–4034

  13. Zaheer MZ, Mahmood A, Astrid M, Lee S-I (2020) Claws: Clustering assisted weakly supervised learning with normalcy suppression for anomalous event detection. In: European conference on computer vision, Springer pp 358–376

  14. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International conference on computer vision, pp 4489–4497

  15. Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6299–6308

  16. Luo W, Liu W, Gao S (2017) A revisit of sparse coding based anomaly detection in stacked rnn framework. In: Proceedings of the IEEE International conference on computer vision, pp 341–349

  17. Wang J, Cherian A (2019) Gods: Generalized one-class discriminative subspaces for anomaly detection. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 8201–8211

  18. Hao Y, Li J, Wang N, Wang X, Gao X (2022) Spatiotemporal consistency-enhanced network for video anomaly detection. Pattern Recogn 121:108232

    Article  Google Scholar 

  19. Mishra SR, Mishra TK, Sarkar A, Sanyal G (2020) Detection of anomalies in human action using optical flow and gradient tensor. In: Smart intelligent computing and applications: proceedings of the third international conference on smart computing and informatics, vol 1, Springer pp 561–570

  20. Bao Q, Liu F, Liu Y, Jiao L, Liu X , Li L (2022) Hierarchical scene normality-binding modeling for anomaly detection in surveillance videos. In: Proceedings of the 30th ACM international conference on multimedia, pp 6103–6112

  21. He C, Shao J, Sun J (2018) An anomaly-introduced learning method for abnormal event detection. Multimedia Tools and Applications 77(22):29573–29588

    Article  Google Scholar 

  22. Liu T, Zhang C, Lam K-M, Kong J (2022) Decouple and resolve: transformer-based models for online anomaly detection from weakly labeled videos. IEEE Trans Inf Forensics Secur 18:15–28

    Article  Google Scholar 

  23. Sapkota H, Yu Q (2022) Bayesian nonparametric submodular video partition for robust anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3212–3221

  24. Yu S, Wang C, **ang L, Wu J (2022) Tca-vad: temporal context alignment network for weakly supervised video anomly detection. In: 2022 IEEE International conference on multimedia and expo (ICME), IEEE pp 1–6

  25. Mu H, Sun R, Wang M, Chen Z (2022) Spatio-temporal graph-based cnns for anomaly detection in weakly-labeled videos. Information Processing & Management 59(4):102983

    Article  Google Scholar 

  26. Thakare KV, Sharma N, Dogra DP, Choi H, Kim I-J (2022) A multi-stream deep neural network with late fuzzy fusion for real-world anomaly detection. Expert Syst Appl 201:117030

    Article  Google Scholar 

  27. Liu Y, Liu J, Zhao M, Li S, Song L (2022) Collaborative normality learning framework for weakly supervised video anomaly detection. IEEE Trans Circuits Syst II Express Briefs 69(5):2508–2512

    Google Scholar 

  28. Chang S, Li Y, Shen S, Feng J, Zhou Z (2021) Contrastive attention for video anomaly detection. IEEE Trans Multimedia 24:4067–4076

    Article  Google Scholar 

  29. Wu P, Liu J (2021) Learning causal temporal relation and feature discrimination for anomaly detection. IEEE Trans Image Process 30:3513–3527

    Article  Google Scholar 

  30. Liu W, Luo W, Li Z, Zhao P, Gao S, et al (2019) Margin learning embedded prediction for video anomaly detection with a few anomalies. In: IJCAI, pp 3023–3030

  31. Ma Z, Machado JJ, Tavares JMR (2021) Weakly supervised video anomaly detection based on 3d convolution and lstm. Sensors 21(22):7508

    Article  Google Scholar 

  32. Ou L, Guo Z, Benetos E, Han J, Wang Y (2022) Exploring transformer‘s potential on automatic piano transcription. In: ICASSP 2022-2022 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE pp 776–780

  33. Kong Q, Li B, Song X, Wan Y, Wang Y (2021) High-resolution piano transcription with pedals by regressing onset and offset times. IEEE/ACM Trans Audio Speech Language Process 29:3707–3717

  34. Li G, Cai G, Zeng X, Zhao R (2022) Scale-aware spatio-temporal relation learning for video anomaly detection. Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27. Proceedings, Part IV, Springer pp, pp 333–350

  35. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  36. Sapkota H, Ying Y, Chen F, Yu Q (2021) Distributionally robust optimization for deep kernel multiple instance learning. In: International conference on artificial intelligence and statistics, PMLR pp 2188–2196

  37. Lv H, Zhou C, Cui Z, Xu C, Li Y, Yang J (2021) Localizing anomalies from weakly-labeled videos. IEEE Trans Image Process 30:4505–4515

    Article  Google Scholar 

  38. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, et al (2019) Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32

  39. Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 733–742

  40. Sun C, Jia Y, Hu Y, Wu Y (2020) Scene-aware context reasoning for unsupervised abnormal event detection in videos. In: Proceedings of the 28th ACM international conference on multimedia, pp 184–192

  41. Wang X, Che Z, Jiang B, **ao N, Yang K, Tang J, Ye J, Wang J, Qi Q (2021) Robust unsupervised video anomaly detection by multipath frame prediction. IEEE transactions on neural networks and learning systems

  42. Georgescu M-I, Barbalau A, Ionescu RT, Khan FS, Popescu M, Shah M (2021) Anomaly detection in video via self-supervised and multi-task learning. In:Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12742–12752

  43. Chen C, **e Y, Lin S, Yao A, Jiang G, Zhang W, Qu Y, Qiao R, Ren B, Ma L (2022) Comprehensive regularization in a bi-directional predictive network for video anomaly detection. In: Proceedings of the American association for artificial intelligence, pp 1–9

Download references

Acknowledgements

We thank LetPub (www.letpub.com) for its linguistic assistance during the preparation of this manuscript. We also thank the reviewers for considerably improving the final quality of this paper.

Funding

This work was supported by the National Key Research and Development Program of China (NO. 2017YFC1703302).

Author information

Authors and Affiliations

Authors

Contributions

Shengjun Peng: Conceptualization, Methodology, Software, Writing - original draft. Yiheng Cai: Project administration, Funding acquisition, Writing - review & editing, Formal analysis, Supervision. Zijun Yao: Data curation. Meiling Tan: Validation.

Corresponding author

Correspondence to Yiheng Cai.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, S., Cai, Y., Yao, Z. et al. Weakly-supervised video anomaly detection via temporal resolution feature learning. Appl Intell 53, 30607–30625 (2023). https://doi.org/10.1007/s10489-023-05072-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05072-8

Keywords

Navigation