Log in

Conjoined triple deep network for video anomaly detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The video anomaly detection task typically involves identifying anomalous targets, behaviors, and events in surveillance using only normal samples. Most mainstream anomaly detection models train an encoder-decoder network exclusively with normal samples, identifying frames with larger reconstruction errors as anomalies. The challenge with such methods lies in controlling the generalization ability of the reconstruction model on anomaly samples and the bias of reconstruction maps towards small-scale anomalies. To address these issues, we propose a triple-stream framework for anomaly detection, combining cross-prediction agent tasks and multiple local probabilistic models. We incorporate a dual learning mechanism in both the appearance and motion channels, allowing mutual feedback to make the model overfit to normal samples and correspondingly weaken its generalization on anomalous samples. Additionally, we apply the attention mechanism to the network, design a feature consistency function to constrain bias to local features, and construct a probability model for each local region to detect larger-scale anomalies. Finally, we design a fusion scheme to evaluate anomaly scores for video frames. Evaluations on popular benchmark datasets, including UCSD, Avenue, and Street Scene, demonstrate that our proposed model achieves competitive performance compared to state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Spain)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data Availability

The datasets analysed during the current study are available at http://101.32.75.151:8181/dataset/.

Code Availability

The data that support the findings of this study are available at https://github.com/changxingya/code.git.

References

  1. Abati D, Porrello A, Calderara S et al (2019) Latent space autoregression for novelty detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 481–490

  2. Allison PD (1999) Logistic regression using the sas system: theory and application. cary, nc: Sas institute. Inc and John Wiley and Sons. https://doi.org/10.1017/CBO9781107415324.004

  3. Benezeth Y, Jodoin PM, Saligrama V et al (2009) Abnormal events detection based on spatio-temporal co-occurences. In: 2009 IEEE Conference on computer vision and pattern recognition. IEEE, pp 2458–2465. https://doi.org/10.1109/CVPR.2009.5206686

  4. Chang Y, Tu Z, **e W et al (2022) Video anomaly detection with spatio-temporal dissociation. Pattern Recogn 122:108213

    Article  Google Scholar 

  5. Chong YS, Tay YH (2017a) Abnormal event detection in videos using spatiotemporal autoencoder. In: International symposium on neural networks. Springer, pp 189–196. https://doi.org/10.1007/978-3-319-59081-3_23

  6. Chong YS, Tay YH (2017b) Abnormal event detection in videos using spatiotemporal autoencoder. In: International symposium on neural networks. Springer, pp 189–196

  7. Cui Y, Yan L, Cao Z et al (2021) Tf-blender: temporal feature blender for video object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8138–8147

  8. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), IEEE, pp 886–893. https://doi.org/10.1109/CVPR.2005.177

  9. Doshi K, Yilmaz Y (2020) Any-shot sequential anomaly detection in surveillance videos. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 934–935

  10. Dosovitskiy A, Fischer P, Ilg E, et al (2015) Flownet: learning optical flow with convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 2758–2766, https://doi.org/10.1109/ICCV.2015.316

  11. Fan Y, Wen G, Li D et al (2020a) Video anomaly detection and localization via gaussian mixture fully convolutional variational autoencoder. Comput Vis Image Underst, pp 102920. https://doi.org/10.1016/j.cviu.2020.102920

  12. Fan Y, Wen G, Li D et al (2020) Video anomaly detection and localization via gaussian mixture fully convolutional variational autoencoder. Comput Vis Image Underst 195:102920

    Article  Google Scholar 

  13. Fradi H, Luvison B, Pham QC (2016) Crowd behavior analysis using local mid-level visual descriptors. IEEE Trans Circuits Syst Video Technol 27(3):589–602

    Article  Google Scholar 

  14. Gong D, Liu L, Le V, et al (2019) Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1705–1714

  15. Goodfellow I, Pouget-Abadie J, Mirza M, et al (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680. http://papers.nips.cc/paper/5423-generative-adversarial-nets

  16. Hao Y, Li J, Wang N et al (2022) Spatiotemporal consistency-enhanced network for video anomaly detection. Pattern Recogn 121:108232

    Article  Google Scholar 

  17. Hasan M, Choi J, Neumann J et al (2016) Learning temporal regularity in video sequences. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 733–742. https://doi.org/10.1109/CVPR.2016.86

  18. Ionescu RT, Smeureanu S, Popescu M et al (2019) Detecting abnormal events in video using narrowed normality clusters. In: 2019 IEEE Winter conference on applications of computer vision (WACV). IEEE, pp 1951–1960

  19. Jaderberg M, Simonyan K, Zisserman A et al (2015) Spatial transformer networks. Adv Neural Inf Process Syst 28

  20. Kim J, Grauman K (2009) Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incremental updates. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 2921–2928. https://doi.org/10.1109/CVPR.2009.5206569

  21. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. ar**v:1412.6980. http://www.oalib.com/paper/4068193

  22. Li Q, Yang R, **ao F et al (2022) Attention-based anomaly detection in multi-view surveillance videos. Knowl-Based Syst 252:109348

    Article  Google Scholar 

  23. Li S, Fang J, Xu H et al (2020) Video frame prediction by deep multi-branch mask network. IEEE Trans Circuits Syst Video Technol 31(4):1283–1295

    Article  Google Scholar 

  24. Liang J, Zhou T, Liu D, et al (2023) Clustseg: clustering for universal segmentation. ar**v:2305.02187

  25. Liu D, Cui Y, Chen Y et al (2020) Video object detection for autonomous driving: motion-aid feature calibration. Neurocomputing 409:1–11

    Article  Google Scholar 

  26. Liu D, Cui Y, Tan W, et al (2021) Sg-net: spatial granularity network for one-stage video instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9816–9825

  27. Liu FT, Ting KM, Zhou ZH (2008) Isolation forest. In: 2008 Eighth IEEE international conference on data mining. IEEE, pp 413–422. https://doi.org/10.1109/ICDM.2008.17

  28. Liu W, Luo W, Lian D et al (2018a) Future frame prediction for anomaly detection–a new baseline. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6536–6545. https://doi.org/10.1109/CVPR.2018.00684

  29. Liu W, Luo W, Lian D et al (2018b) Future frame prediction for anomaly detection–a new baseline. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6536–6545

  30. Lu C, Shi J, Jia J (2013) Abnormal event detection at 150 fps in matlab. In: Proceedings of the IEEE international conference on computer vision, pp 2720–2727. https://doi.org/10.1109/ICCV.2013.338

  31. Lu Y, Cao C, Zhang Y et al (2022) Learnable locality-sensitive hashing for video anomaly detection. IEEE Trans Circuits Syst Video Technol

  32. Luo W, Liu W, Gao S (2017a) Remembering history with convolutional lstm for anomaly detection. In: 2017 IEEE International conference on multimedia and expo (ICME). IEEE, pp 439–444. https://doi.org/10.1109/ICME.2017.8019325

  33. Luo W, Liu W, Gao S (2017b) A revisit of sparse coding based anomaly detection in stacked rnn framework. In: Proceedings of the IEEE international conference on computer vision, pp 341–349. https://doi.org/10.1109/ICCV.2017.45

  34. Mahadevan V, Li W, Bhalodia V et al (2010) Anomaly detection in crowded scenes. In: 2010 IEEE Computer society conference on computer vision and pattern recognition. IEEE, pp 1975–1981. https://doi.org/10.1109/CVPR.2010.5539872

  35. Mehran R, Oyama A, Shah M (2009) Abnormal crowd behavior detection using social force model. In: 2009 IEEE Conference on computer vision and pattern recognition. IEEE, pp 935–942. https://doi.org/10.1109/CVPR.2009.5206641

  36. Mo X, Monga V, Bala R, Fan Z (2013) Adaptive sparse representations for video nomaly detection. IEEE Trans Circuits Syst Video Technol 4(4):631–645

  37. Nguyen MN, Vien NA (2018) Scalable and interpretable one-class svms with deep learning and random fourier features. In: Joint european conference on machine learning and knowledge discovery in databases.Springer, pp 157–172. https://doi.org/10.1007/978-3-030-10925-7_10

  38. Nguyen TN, Meunier J (2019) Hybrid deep network for anomaly detection. ar**v:1908.06347

  39. Park H, Noh J, Ham B (2020) Learning memory-guided normality for anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14372–14381

  40. Ramachandra B, Jones M (2020) Street scene: a new dataset and evaluation protocol for video anomaly detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 2569–2578

  41. Ranjan A, Black MJ (2017) Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4161–4170

  42. Ranzato M, Poultney C, Chopra S et al (2007) Efficient learning of sparse representations with an energy-based model. In: Advances in neural information processing systems, pp 1137–1144. https://doi.org/10.7551/mitpress/7503.003.0147

  43. Rao AS, Gubbi J, Rajasegarar S et al (2014) Detection of anomalous crowd behaviour using hyperspherical clustering. In: 2014 International conference on digital image computing: techniques and applications (DICTA). IEEE, pp 1–8. https://doi.org/10.1109/DICTA.2014.7008100

  44. Sabokrou M, Fayyaz M, Fathy M et al (2018) Deep-anomaly: fully convolutional neural network for fast anomaly detection in crowded scenes. Comput Vis Image Underst 172:88–97. https://doi.org/10.1016/j.cviu.2018.02.006

    Article  Google Scholar 

  45. Sabokrou M, Khalooei M, Fathy M et al (2018b) Adversarially learned one-class classifier for novelty detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3379–3388. https://doi.org/10.1109/CVPR.2018.00356

  46. Sabokrou M, Khalooei M, Fathy M et al (2018c) Adversarially learned one-class classifier for novelty detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3379–3388

  47. Schölkopf B, Platt JC, Shawe-Taylor J et al (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471. https://doi.org/10.1162/089976601750264965

    Article  Google Scholar 

  48. Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6479–6488. https://doi.org/10.1109/CVPR.2018.00678

  49. Tudor Ionescu R, Smeureanu S, Alexe B et al (2017) Unmasking the abnormal events in video. In: Proceedings of the IEEE international conference on computer vision, pp 2895–2903. https://doi.org/10.1109/ICCV.2017.315

  50. Wang T, Snoussi H (2013) Histograms of optical flow orientation for abnormal events detection. In: 2013 IEEE International workshop on performance evaluation of tracking and surveillance (PETS). IEEE, pp 45–52. https://doi.org/10.1109/AVSS.2012.39

  51. Wang W, Han C, Zhou T et al (2022) Visual recognition with deep nearest centroids. ar**v:2209.07383

  52. Woo S, Park J, Lee JY et al (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  53. Xu D, Yan Y, Ricci E et al (2017) Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput Vis Image Underst 156:117–127. https://doi.org/10.1016/j.cviu.2016.10.010

    Article  Google Scholar 

  54. Yan L, Ma S, Wang Q et al (2022) Video captioning using global-local representation. IEEE Trans Circuits Syst Video Technol 32(10):6642–6656

    Article  Google Scholar 

  55. Yan L, Wang Q, Ma S et al (2022) Solve the puzzle of instance segmentation in videos: a weakly supervised framework with spatio-temporal collaboration. IEEE Trans Circuits Syst Video Technol 33(1):393–406

    Article  Google Scholar 

  56. Zhang D, Gatica-Perez D, Bengio S et al (2005) Semi-supervised adapted hmms for unusual event detection. In: 2005 IEEE Computer society conference on computer vision and pattern recognition (CVPR’05). IEEE, pp 611–618. https://doi.org/10.1109/CVPR.2005.316

  57. Zhang Y, Nie X, He R et al (2020) Normality learning in multispace for video anomaly detection. IEEE Trans Circuits Syst Video Technol 31(9):3694–3706

    Article  Google Scholar 

  58. Zhong Y, Chen X, Hu Y et al (2022) Bidirectional spatio-temporal feature learning with multiscale evaluation for video anomaly detection. IEEE Trans Circuits Syst Video Technol 32(12):8285–8296

    Article  Google Scholar 

  59. Zong B, Song Q, Min MR et al (2018) Deep autoencoding gaussian mixture modelfor unsupervised anomaly detection. In: International conference on learning representations. https://openreview.net/forum?id=BJJLHbb0-

Download references

Funding

This work is supported by the National Natural Science Foundation of China (NSFC) (62202087).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by **ngya Chang, Yunhe Wu, Shizhuo Deng, Tong Jia, Dongyue Chen. The first draft of the manuscript was written by **ngya Chang and Yunhe Wu, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Dongyue Chen.

Ethics declarations

Conflicts of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chang, X., Wu, Y., Deng, S. et al. Conjoined triple deep network for video anomaly detection. Multimed Tools Appl 83, 59491–59518 (2024). https://doi.org/10.1007/s11042-023-17842-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17842-0

Keywords

Navigation