Log in

Attention-guided residual frame learning for video anomaly detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The problem of anomaly detection in video surveillance data has been an active research topic. The main difficulty of video anomaly detection is due to two different definitions of anomalies: semantically abnormal objects and motion caused by unauthorized changes in objects. We propose a new framework for video anomaly detection by designing a convolutional long short-term memory-based model that emphasizes semantic objects using self-attention mechanisms and concatenation operations to further improve performance. Moreover, our proposed method is designed to learn only the residuals of the next frame, which allows the model to better focus on anomalous objects in video frames and also enhances stability of the training process. Our model substantially outperformed previous models on the Chinese University of Hong Kong (CUHK) Avenue and Subway Exit datasets. Our experiments also demonstrated that each module of the residual frame learning and the attention block incorporated into our framework is effective in improving the performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Adam A, Rivlin E, Shimshoni I, Reinitz D (2008) Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans Pattern Anal Mach Intell 30(3):555–560

    Article  Google Scholar 

  2. Brown A, Tuor A, Hutchinson B, Nichols N (2018) Recurrent neural network attention mechanisms for interpretable system log anomaly detection. In: Proceedings of the first workshop on machine learning for computing systems, pp 1–8

  3. Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 6299–6308

  4. Chen LC, Yang Y, Wang J, Xu W, Yuille AL (2016) Attention to scale: scale-aware semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3640–3649

  5. Chollet F (2015) keras. https://github.com/fchollet/keras

  6. Del Giorno A, Bagnell JA, Hebert M (2016) A discriminative framework for anomaly detection in large videos. In: European conference on computer vision. Springer, pp 334–349

  7. Farnebäck G (2003) Two-frame motion estimation based on polynomial expansion. In: Scandinavian conference on image analysis. Springer, pp 363–370

  8. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256

  9. Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 733–742

  10. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  11. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  12. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  13. Kim J, Kwon Lee J, Mu Lee K (2016) Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1646–1654

  14. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. ar**v:1412.6980

  15. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  16. Liao X, Li K, Zhu X, Liu KR (2020) Robust detection of image operator chain with two-stream convolutional neural network. IEEE J Sel Top Signal Process 14(5):955–968

    Article  Google Scholar 

  17. Liu W, Luo W, Lian D, Gao S (2018) Future frame prediction for anomaly detection–a new baseline. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6536–6545

  18. Lu C, Shi J, Jia J (2013) Abnormal event detection at 150 fps in matlab. In: Proceedings of the IEEE international conference on computer vision, pp 2720–2727

  19. Lucas BD, Kanade T et al (1981) An iterative image registration technique with an application to stereo vision

  20. Luo W, Liu W, Gao S (2017) Remembering history with convolutional lstm for anomaly detection. In: 2017 IEEE international conference on multimedia and expo (ICME). IEEE, pp 439–444

  21. Luo W, Liu W, Gao S (2017) A revisit of sparse coding based anomaly detection in stacked rnn framework. In: Proceedings of the IEEE international conference on computer vision, pp 341–349

  22. Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) Anomaly detection in crowded scenes. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 1975–1981

  23. Medel JR, Savakis A (2016) Anomaly detection in video using predictive convolutional long short-term memory networks. ar**v:1612.00390

  24. Peng L, Liao X, Chen M (2021) Resampling parameter estimation via dual-filtering based convolutional neural network. Multimed Syst 27 (3):363–370

    Article  Google Scholar 

  25. Ribeiro M, Lazzaretti AE, Lopes HS (2018) A study of deep convolutional auto-encoders for anomaly detection in videos. Pattern Recogn Lett 105:13–22

    Article  Google Scholar 

  26. Ryan D, Denman S, Fookes C, Sridharan S (2011) Textures of optical flow for real-time anomaly detection in crowds. In: 2011 8th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 230–235

  27. Sabokrou M, Fayyaz M, Fathy M, Moayed Z, Klette R (2018) Deep-anomaly: fully convolutional neural network for fast anomaly detection in crowded scenes. Comput Vis Image Underst 172:88–97

    Article  MATH  Google Scholar 

  28. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. ar**v:1409.1556

  29. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  30. Tudor Ionescu R, Smeureanu S, Alexe B, Popescu M (2017) Unmasking the abnormal events in video. In: Proceedings of the IEEE international conference on computer vision, pp 2895–2903

  31. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  32. Wang T, Snoussi H (2012) Histograms of optical flow orientation for visual abnormal events detection. In: 2012 IEEE ninth international conference on advanced video and signal-based surveillance. IEEE, pp 13–18

  33. **ngjian S, Chen Z, Wang H, Yeung DY, Wong WK, Woo WC (2015) Convolutional lstm network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, pp 802–810

  34. Yan L, Fu J, Wang C, Ye Z, Chen H, Ling H (2021) Enhanced network optimized generative adversarial network for image enhancement. Multimed Tools Appl 80(9):14363–14381

    Article  Google Scholar 

  35. Yang B, Cao J, Ni R, Zou L (2018) Anomaly detection in moving crowds through spatiotemporal autoencoding and additional attention. Adv Multimed, vol 2018

  36. Zhang Y, Tian Y, Kong Y, Zhong B, Fu Y (2018) Residual dense network for image super-resolution. In: Proceedings of the ieee conference on computer vision and pattern recognition, pp 2472–2481

  37. Zhou JT, Du J, Zhu H, Peng X, Liu Y, Goh RSM (2019) Anomalynet: an anomaly detection network for video surveillance. IEEE Trans Inf Forensics Secur

  38. Zhou Y, Li J, Chen H, Wu Y, Wu J, Chen L (2020) A spatiotemporal attention mechanism-based model for multi-step citywide passenger demand prediction. Inf Sci 513:372–385

    Article  Google Scholar 

  39. Zhou S, Shen W, Zeng D, Fang M, Wei Y, Zhang Z (2016) Spatial–temporal convolutional neural networks for anomaly detection and localization in crowded scenes. Signal Process Image Commun 47:358–368

    Article  Google Scholar 

  40. Zhou JT, Zhang L, Fang Z, Du J, Peng X, Yang X (2019) Attention-driven loss for anomaly detection in video surveillance. IEEE Trans Circuits Syst Video Technol

Download references

Funding

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT)(No. NRF-2022R1A2C1007434), and also by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program(IITP-2021-2018-0-01431).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyung-Ah Sohn.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interset.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Jun-Hyung Yu and Jeong-Hyeon Moon contributed equally to this work

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, JH., Moon, JH. & Sohn, KA. Attention-guided residual frame learning for video anomaly detection. Multimed Tools Appl 82, 12099–12116 (2023). https://doi.org/10.1007/s11042-022-13643-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13643-z

Keywords

Navigation