Abstract
The video anomaly detection task typically involves identifying anomalous targets, behaviors, and events in surveillance using only normal samples. Most mainstream anomaly detection models train an encoder-decoder network exclusively with normal samples, identifying frames with larger reconstruction errors as anomalies. The challenge with such methods lies in controlling the generalization ability of the reconstruction model on anomaly samples and the bias of reconstruction maps towards small-scale anomalies. To address these issues, we propose a triple-stream framework for anomaly detection, combining cross-prediction agent tasks and multiple local probabilistic models. We incorporate a dual learning mechanism in both the appearance and motion channels, allowing mutual feedback to make the model overfit to normal samples and correspondingly weaken its generalization on anomalous samples. Additionally, we apply the attention mechanism to the network, design a feature consistency function to constrain bias to local features, and construct a probability model for each local region to detect larger-scale anomalies. Finally, we design a fusion scheme to evaluate anomaly scores for video frames. Evaluations on popular benchmark datasets, including UCSD, Avenue, and Street Scene, demonstrate that our proposed model achieves competitive performance compared to state-of-the-art methods.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17842-0/MediaObjects/11042_2023_17842_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17842-0/MediaObjects/11042_2023_17842_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17842-0/MediaObjects/11042_2023_17842_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17842-0/MediaObjects/11042_2023_17842_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17842-0/MediaObjects/11042_2023_17842_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17842-0/MediaObjects/11042_2023_17842_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17842-0/MediaObjects/11042_2023_17842_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17842-0/MediaObjects/11042_2023_17842_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17842-0/MediaObjects/11042_2023_17842_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17842-0/MediaObjects/11042_2023_17842_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17842-0/MediaObjects/11042_2023_17842_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17842-0/MediaObjects/11042_2023_17842_Fig12_HTML.png)
Similar content being viewed by others
Data Availability
The datasets analysed during the current study are available at http://101.32.75.151:8181/dataset/.
Code Availability
The data that support the findings of this study are available at https://github.com/changxingya/code.git.
References
Abati D, Porrello A, Calderara S et al (2019) Latent space autoregression for novelty detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 481–490
Allison PD (1999) Logistic regression using the sas system: theory and application. cary, nc: Sas institute. Inc and John Wiley and Sons. https://doi.org/10.1017/CBO9781107415324.004
Benezeth Y, Jodoin PM, Saligrama V et al (2009) Abnormal events detection based on spatio-temporal co-occurences. In: 2009 IEEE Conference on computer vision and pattern recognition. IEEE, pp 2458–2465. https://doi.org/10.1109/CVPR.2009.5206686
Chang Y, Tu Z, **e W et al (2022) Video anomaly detection with spatio-temporal dissociation. Pattern Recogn 122:108213
Chong YS, Tay YH (2017a) Abnormal event detection in videos using spatiotemporal autoencoder. In: International symposium on neural networks. Springer, pp 189–196. https://doi.org/10.1007/978-3-319-59081-3_23
Chong YS, Tay YH (2017b) Abnormal event detection in videos using spatiotemporal autoencoder. In: International symposium on neural networks. Springer, pp 189–196
Cui Y, Yan L, Cao Z et al (2021) Tf-blender: temporal feature blender for video object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8138–8147
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), IEEE, pp 886–893. https://doi.org/10.1109/CVPR.2005.177
Doshi K, Yilmaz Y (2020) Any-shot sequential anomaly detection in surveillance videos. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 934–935
Dosovitskiy A, Fischer P, Ilg E, et al (2015) Flownet: learning optical flow with convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 2758–2766, https://doi.org/10.1109/ICCV.2015.316
Fan Y, Wen G, Li D et al (2020a) Video anomaly detection and localization via gaussian mixture fully convolutional variational autoencoder. Comput Vis Image Underst, pp 102920. https://doi.org/10.1016/j.cviu.2020.102920
Fan Y, Wen G, Li D et al (2020) Video anomaly detection and localization via gaussian mixture fully convolutional variational autoencoder. Comput Vis Image Underst 195:102920
Fradi H, Luvison B, Pham QC (2016) Crowd behavior analysis using local mid-level visual descriptors. IEEE Trans Circuits Syst Video Technol 27(3):589–602
Gong D, Liu L, Le V, et al (2019) Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1705–1714
Goodfellow I, Pouget-Abadie J, Mirza M, et al (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680. http://papers.nips.cc/paper/5423-generative-adversarial-nets
Hao Y, Li J, Wang N et al (2022) Spatiotemporal consistency-enhanced network for video anomaly detection. Pattern Recogn 121:108232
Hasan M, Choi J, Neumann J et al (2016) Learning temporal regularity in video sequences. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 733–742. https://doi.org/10.1109/CVPR.2016.86
Ionescu RT, Smeureanu S, Popescu M et al (2019) Detecting abnormal events in video using narrowed normality clusters. In: 2019 IEEE Winter conference on applications of computer vision (WACV). IEEE, pp 1951–1960
Jaderberg M, Simonyan K, Zisserman A et al (2015) Spatial transformer networks. Adv Neural Inf Process Syst 28
Kim J, Grauman K (2009) Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incremental updates. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 2921–2928. https://doi.org/10.1109/CVPR.2009.5206569
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. ar**v:1412.6980. http://www.oalib.com/paper/4068193
Li Q, Yang R, **ao F et al (2022) Attention-based anomaly detection in multi-view surveillance videos. Knowl-Based Syst 252:109348
Li S, Fang J, Xu H et al (2020) Video frame prediction by deep multi-branch mask network. IEEE Trans Circuits Syst Video Technol 31(4):1283–1295
Liang J, Zhou T, Liu D, et al (2023) Clustseg: clustering for universal segmentation. ar**v:2305.02187
Liu D, Cui Y, Chen Y et al (2020) Video object detection for autonomous driving: motion-aid feature calibration. Neurocomputing 409:1–11
Liu D, Cui Y, Tan W, et al (2021) Sg-net: spatial granularity network for one-stage video instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9816–9825
Liu FT, Ting KM, Zhou ZH (2008) Isolation forest. In: 2008 Eighth IEEE international conference on data mining. IEEE, pp 413–422. https://doi.org/10.1109/ICDM.2008.17
Liu W, Luo W, Lian D et al (2018a) Future frame prediction for anomaly detection–a new baseline. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6536–6545. https://doi.org/10.1109/CVPR.2018.00684
Liu W, Luo W, Lian D et al (2018b) Future frame prediction for anomaly detection–a new baseline. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6536–6545
Lu C, Shi J, Jia J (2013) Abnormal event detection at 150 fps in matlab. In: Proceedings of the IEEE international conference on computer vision, pp 2720–2727. https://doi.org/10.1109/ICCV.2013.338
Lu Y, Cao C, Zhang Y et al (2022) Learnable locality-sensitive hashing for video anomaly detection. IEEE Trans Circuits Syst Video Technol
Luo W, Liu W, Gao S (2017a) Remembering history with convolutional lstm for anomaly detection. In: 2017 IEEE International conference on multimedia and expo (ICME). IEEE, pp 439–444. https://doi.org/10.1109/ICME.2017.8019325
Luo W, Liu W, Gao S (2017b) A revisit of sparse coding based anomaly detection in stacked rnn framework. In: Proceedings of the IEEE international conference on computer vision, pp 341–349. https://doi.org/10.1109/ICCV.2017.45
Mahadevan V, Li W, Bhalodia V et al (2010) Anomaly detection in crowded scenes. In: 2010 IEEE Computer society conference on computer vision and pattern recognition. IEEE, pp 1975–1981. https://doi.org/10.1109/CVPR.2010.5539872
Mehran R, Oyama A, Shah M (2009) Abnormal crowd behavior detection using social force model. In: 2009 IEEE Conference on computer vision and pattern recognition. IEEE, pp 935–942. https://doi.org/10.1109/CVPR.2009.5206641
Mo X, Monga V, Bala R, Fan Z (2013) Adaptive sparse representations for video nomaly detection. IEEE Trans Circuits Syst Video Technol 4(4):631–645
Nguyen MN, Vien NA (2018) Scalable and interpretable one-class svms with deep learning and random fourier features. In: Joint european conference on machine learning and knowledge discovery in databases.Springer, pp 157–172. https://doi.org/10.1007/978-3-030-10925-7_10
Nguyen TN, Meunier J (2019) Hybrid deep network for anomaly detection. ar**v:1908.06347
Park H, Noh J, Ham B (2020) Learning memory-guided normality for anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14372–14381
Ramachandra B, Jones M (2020) Street scene: a new dataset and evaluation protocol for video anomaly detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 2569–2578
Ranjan A, Black MJ (2017) Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4161–4170
Ranzato M, Poultney C, Chopra S et al (2007) Efficient learning of sparse representations with an energy-based model. In: Advances in neural information processing systems, pp 1137–1144. https://doi.org/10.7551/mitpress/7503.003.0147
Rao AS, Gubbi J, Rajasegarar S et al (2014) Detection of anomalous crowd behaviour using hyperspherical clustering. In: 2014 International conference on digital image computing: techniques and applications (DICTA). IEEE, pp 1–8. https://doi.org/10.1109/DICTA.2014.7008100
Sabokrou M, Fayyaz M, Fathy M et al (2018) Deep-anomaly: fully convolutional neural network for fast anomaly detection in crowded scenes. Comput Vis Image Underst 172:88–97. https://doi.org/10.1016/j.cviu.2018.02.006
Sabokrou M, Khalooei M, Fathy M et al (2018b) Adversarially learned one-class classifier for novelty detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3379–3388. https://doi.org/10.1109/CVPR.2018.00356
Sabokrou M, Khalooei M, Fathy M et al (2018c) Adversarially learned one-class classifier for novelty detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3379–3388
Schölkopf B, Platt JC, Shawe-Taylor J et al (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471. https://doi.org/10.1162/089976601750264965
Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6479–6488. https://doi.org/10.1109/CVPR.2018.00678
Tudor Ionescu R, Smeureanu S, Alexe B et al (2017) Unmasking the abnormal events in video. In: Proceedings of the IEEE international conference on computer vision, pp 2895–2903. https://doi.org/10.1109/ICCV.2017.315
Wang T, Snoussi H (2013) Histograms of optical flow orientation for abnormal events detection. In: 2013 IEEE International workshop on performance evaluation of tracking and surveillance (PETS). IEEE, pp 45–52. https://doi.org/10.1109/AVSS.2012.39
Wang W, Han C, Zhou T et al (2022) Visual recognition with deep nearest centroids. ar**v:2209.07383
Woo S, Park J, Lee JY et al (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Xu D, Yan Y, Ricci E et al (2017) Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput Vis Image Underst 156:117–127. https://doi.org/10.1016/j.cviu.2016.10.010
Yan L, Ma S, Wang Q et al (2022) Video captioning using global-local representation. IEEE Trans Circuits Syst Video Technol 32(10):6642–6656
Yan L, Wang Q, Ma S et al (2022) Solve the puzzle of instance segmentation in videos: a weakly supervised framework with spatio-temporal collaboration. IEEE Trans Circuits Syst Video Technol 33(1):393–406
Zhang D, Gatica-Perez D, Bengio S et al (2005) Semi-supervised adapted hmms for unusual event detection. In: 2005 IEEE Computer society conference on computer vision and pattern recognition (CVPR’05). IEEE, pp 611–618. https://doi.org/10.1109/CVPR.2005.316
Zhang Y, Nie X, He R et al (2020) Normality learning in multispace for video anomaly detection. IEEE Trans Circuits Syst Video Technol 31(9):3694–3706
Zhong Y, Chen X, Hu Y et al (2022) Bidirectional spatio-temporal feature learning with multiscale evaluation for video anomaly detection. IEEE Trans Circuits Syst Video Technol 32(12):8285–8296
Zong B, Song Q, Min MR et al (2018) Deep autoencoding gaussian mixture modelfor unsupervised anomaly detection. In: International conference on learning representations. https://openreview.net/forum?id=BJJLHbb0-
Funding
This work is supported by the National Natural Science Foundation of China (NSFC) (62202087).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by **ngya Chang, Yunhe Wu, Shizhuo Deng, Tong Jia, Dongyue Chen. The first draft of the manuscript was written by **ngya Chang and Yunhe Wu, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflicts of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Chang, X., Wu, Y., Deng, S. et al. Conjoined triple deep network for video anomaly detection. Multimed Tools Appl 83, 59491–59518 (2024). https://doi.org/10.1007/s11042-023-17842-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17842-0