Abstract
Weakly supervised video anomaly detection (WS-VAD) is often formulated as a multiple instance learning (MIL) problem. Snippet-level anomaly scores can be predicted using only video-level annotations, but most MIL approaches focus on improving the performance of the feature learning network and ignore the method design of the preprocessing stage. MIL-based methods usually preprocess videos of different lengths into a predefined number of snippets for later anomaly identification. This is impractical for real-world videos of varying lengths when the duration of anomalous events is unknown in training. Data with different temporal resolutions generated by this division confuses the network and leads to limited detection capability. To address this issue, we propose a novel WS-VAD method. First, a temporal resolution feature map** module (TRFM) improves the network’s learning ability for input data with different temporal resolutions by map** the temporal resolution information into the feature learning space. We also introduce a gated recurrent unit (GRU)-based multi-scale temporal feature learning module (MS-GRU), combining GRUs with multi-scale convolutional structures and fusing features recursively at different time scales. This module exploits the ability of GRUs to extract temporal information and compensates for the fact that GRUs only extract single-scale temporal dependence. In addition, we propose the Adaptive-k module to optimize the original Top-k loss and increase flexibility in training by using the optimal number of anomalous segments k generated according to the different inputs. This approach is fully applicable to real-world videos of various lengths. Experimental results show that our model boosts the detection accuracy for data with enormous differences in temporal resolution and obtains state-of-the-art frame-level AUC performance on three real-world surveillance datasets: UCF-Crime, ShanghaiTech and XD-violence datasets.
Graphic abstract
![](http://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs10489-023-05072-8/MediaObjects/10489_2023_5072_Fige_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-05072-8/MediaObjects/10489_2023_5072_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-05072-8/MediaObjects/10489_2023_5072_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-05072-8/MediaObjects/10489_2023_5072_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-05072-8/MediaObjects/10489_2023_5072_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-05072-8/MediaObjects/10489_2023_5072_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-05072-8/MediaObjects/10489_2023_5072_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-05072-8/MediaObjects/10489_2023_5072_Fig7_HTML.png)
Similar content being viewed by others
Data Availability
The UCF-crime datasets are available at https://www.crcv.ucf.edu/projects/real-world/. The ShangHaiTech datasets are available at https://svip-lab.github.io/dataset/campus_dataset.html. The XD-Violence datasets are available at https://roc-ng.github.io/XD-Violence/.
References
Zaigham Zaheer M, Lee J-H, Astrid M, Lee S-I (2020) Old is gold: redefining the adversarially learned one-class classifier training paradigm. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14171–14181. https://doi.org/10.1109/CVPR42600.2020.01419
Liu Z, Nie Y, Long C, Zhang Q, Li G (2021) A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 13588–13597
Wen L, Weixin L, Dongze L, Shenghua G (2018) Future frame prediction for anomaly detection - a new baseline. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6536–6545. https://doi.org/10.1109/CVPR.2018.00684
Yu J, Lee Y, Yow KC, Jeon M, Pedrycz W (2021) Abnormal event detection and localization via adversarial event prediction. IEEE Transactions on neural networks and learning systems, 1–15
Tian Y, Pang G, Chen Y, Singh R, Verjans JW, Carneiro G (2021) Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 4975–4986
Feng J-C, Hong F-T, Zheng W-S (2021) Mist: Multiple instance self-training framework for video anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14009–14018
Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 6479–6488. https://doi.org/10.1109/CVPR.2018.00678
Yu G, Wang S, Cai Z, Zhu E, Xu C, Yin J, Kloft M (2020) Cloze test helps: Effective video anomaly detection via learning to complete video events. In: Proceedings of the 28th ACM international conference on multimedia, pp 583–591
Zhong J-X, Li N, Kong W, Liu S, Li TH, Li G (2019) Graph convolutional label noise cleaner: train a plug-and-play action classifier for anomaly detection. In: Proceedins of the IEEE/CVF conference on computer vision and pattern recognition, pp 1237–1246. https://doi.org/10.1109/CVPR.2019.00133
Wu P, Liu J, Shi Y, Sun Y, Shao F, Wu Z, Yang Z (2020) Not only look, but also listen: Learning multimodal violence detection under weak supervision. In: European conference on computer vision, springer pp 322–339
Wan B, Fang Y, **a X, Mei J (2020) Weakly supervised video anomaly detection via center-guided discriminative learning. In: 2020 IEEE International conference on multimedia and expo (ICME), IEEE pp 1–6
Zhang J, Qing L, Miao J (2019) Temporal convolutional network with complementary inner bag loss for weakly supervised anomaly detection. In: 2019 IEEE International conference on image processing (ICIP), IEEE pp 4030–4034
Zaheer MZ, Mahmood A, Astrid M, Lee S-I (2020) Claws: Clustering assisted weakly supervised learning with normalcy suppression for anomalous event detection. In: European conference on computer vision, Springer pp 358–376
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International conference on computer vision, pp 4489–4497
Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6299–6308
Luo W, Liu W, Gao S (2017) A revisit of sparse coding based anomaly detection in stacked rnn framework. In: Proceedings of the IEEE International conference on computer vision, pp 341–349
Wang J, Cherian A (2019) Gods: Generalized one-class discriminative subspaces for anomaly detection. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 8201–8211
Hao Y, Li J, Wang N, Wang X, Gao X (2022) Spatiotemporal consistency-enhanced network for video anomaly detection. Pattern Recogn 121:108232
Mishra SR, Mishra TK, Sarkar A, Sanyal G (2020) Detection of anomalies in human action using optical flow and gradient tensor. In: Smart intelligent computing and applications: proceedings of the third international conference on smart computing and informatics, vol 1, Springer pp 561–570
Bao Q, Liu F, Liu Y, Jiao L, Liu X , Li L (2022) Hierarchical scene normality-binding modeling for anomaly detection in surveillance videos. In: Proceedings of the 30th ACM international conference on multimedia, pp 6103–6112
He C, Shao J, Sun J (2018) An anomaly-introduced learning method for abnormal event detection. Multimedia Tools and Applications 77(22):29573–29588
Liu T, Zhang C, Lam K-M, Kong J (2022) Decouple and resolve: transformer-based models for online anomaly detection from weakly labeled videos. IEEE Trans Inf Forensics Secur 18:15–28
Sapkota H, Yu Q (2022) Bayesian nonparametric submodular video partition for robust anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3212–3221
Yu S, Wang C, **ang L, Wu J (2022) Tca-vad: temporal context alignment network for weakly supervised video anomly detection. In: 2022 IEEE International conference on multimedia and expo (ICME), IEEE pp 1–6
Mu H, Sun R, Wang M, Chen Z (2022) Spatio-temporal graph-based cnns for anomaly detection in weakly-labeled videos. Information Processing & Management 59(4):102983
Thakare KV, Sharma N, Dogra DP, Choi H, Kim I-J (2022) A multi-stream deep neural network with late fuzzy fusion for real-world anomaly detection. Expert Syst Appl 201:117030
Liu Y, Liu J, Zhao M, Li S, Song L (2022) Collaborative normality learning framework for weakly supervised video anomaly detection. IEEE Trans Circuits Syst II Express Briefs 69(5):2508–2512
Chang S, Li Y, Shen S, Feng J, Zhou Z (2021) Contrastive attention for video anomaly detection. IEEE Trans Multimedia 24:4067–4076
Wu P, Liu J (2021) Learning causal temporal relation and feature discrimination for anomaly detection. IEEE Trans Image Process 30:3513–3527
Liu W, Luo W, Li Z, Zhao P, Gao S, et al (2019) Margin learning embedded prediction for video anomaly detection with a few anomalies. In: IJCAI, pp 3023–3030
Ma Z, Machado JJ, Tavares JMR (2021) Weakly supervised video anomaly detection based on 3d convolution and lstm. Sensors 21(22):7508
Ou L, Guo Z, Benetos E, Han J, Wang Y (2022) Exploring transformer‘s potential on automatic piano transcription. In: ICASSP 2022-2022 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE pp 776–780
Kong Q, Li B, Song X, Wan Y, Wang Y (2021) High-resolution piano transcription with pedals by regressing onset and offset times. IEEE/ACM Trans Audio Speech Language Process 29:3707–3717
Li G, Cai G, Zeng X, Zhao R (2022) Scale-aware spatio-temporal relation learning for video anomaly detection. Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27. Proceedings, Part IV, Springer pp, pp 333–350
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Sapkota H, Ying Y, Chen F, Yu Q (2021) Distributionally robust optimization for deep kernel multiple instance learning. In: International conference on artificial intelligence and statistics, PMLR pp 2188–2196
Lv H, Zhou C, Cui Z, Xu C, Li Y, Yang J (2021) Localizing anomalies from weakly-labeled videos. IEEE Trans Image Process 30:4505–4515
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, et al (2019) Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32
Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 733–742
Sun C, Jia Y, Hu Y, Wu Y (2020) Scene-aware context reasoning for unsupervised abnormal event detection in videos. In: Proceedings of the 28th ACM international conference on multimedia, pp 184–192
Wang X, Che Z, Jiang B, **ao N, Yang K, Tang J, Ye J, Wang J, Qi Q (2021) Robust unsupervised video anomaly detection by multipath frame prediction. IEEE transactions on neural networks and learning systems
Georgescu M-I, Barbalau A, Ionescu RT, Khan FS, Popescu M, Shah M (2021) Anomaly detection in video via self-supervised and multi-task learning. In:Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12742–12752
Chen C, **e Y, Lin S, Yao A, Jiang G, Zhang W, Qu Y, Qiao R, Ren B, Ma L (2022) Comprehensive regularization in a bi-directional predictive network for video anomaly detection. In: Proceedings of the American association for artificial intelligence, pp 1–9
Acknowledgements
We thank LetPub (www.letpub.com) for its linguistic assistance during the preparation of this manuscript. We also thank the reviewers for considerably improving the final quality of this paper.
Funding
This work was supported by the National Key Research and Development Program of China (NO. 2017YFC1703302).
Author information
Authors and Affiliations
Contributions
Shengjun Peng: Conceptualization, Methodology, Software, Writing - original draft. Yiheng Cai: Project administration, Funding acquisition, Writing - review & editing, Formal analysis, Supervision. Zijun Yao: Data curation. Meiling Tan: Validation.
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Peng, S., Cai, Y., Yao, Z. et al. Weakly-supervised video anomaly detection via temporal resolution feature learning. Appl Intell 53, 30607–30625 (2023). https://doi.org/10.1007/s10489-023-05072-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05072-8