Log in

SD-Net: Understanding overcrowded scenes in real-time via an efficient dilated convolutional neural network

  • Special Issue Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

The advancements in computer vision-related technologies attract many researchers for surveillance applications, particularly involving the automated crowded scenes analysis such as crowd counting in a very congested scene. In crowd counting, the main goal is to count or estimate the number of people in a particular scene. Understanding overcrowded scenes in real-time is important for instant responsive actions. However, it is a very difficult task due to some of the key challenges including clutter background, occlusion, variations in human pose and scale, and limited surveillance training data, that are inadequately covered in the employed literature. To tackle these challenges, we introduce “SD-Net” an end-to-end CNN architecture, which produces real-time high quality density maps and effectively counts people in extremely overcrowded scenes. The proposed architecture consists of depthwise separable, standard, and dilated 2D convolutional layers. Depthwise separable and standard 2D convolutional layers are used to extract 2D features. Instead of using pooling layers, dilated 2D convolutional layers are employed that results in huge receptive fields and reduces the number of parameters. Our CNN architecture is evaluated using four publicly available crowd analysis datasets, demonstrating superiority over state-of-the-art in terms of accuracy and model size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Li, T., et al.: Crowded scene analysis: a survey. IEEE Trans. Circuits Syst. Video Technol. 25(3), 367–386 (2014)

    Article  Google Scholar 

  2. Hassaballah, M., Kenk, M.A., Elhenawy, I.M.: On-road vehicles detection using appearance and texture information. Egypt. Comput. Sci. J. 43(1) (2019)

  3. Zhang, C., et al.: Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (2015)

  4. Zhang, C., et al.: Data-driven crowd understanding: a baseline for a large-scale crowd dataset. IEEE Trans. Multimedia 18(6), 1048–1061 (2016)

    Article  Google Scholar 

  5. Li, Y., Zhang, X., Chen, D.: Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (2018)

  6. Pan, J., et al.: Shallow and deep convolutional networks for saliency prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

  7. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (2015)

  8. Wei, Y., et al.: Stc: A simple to complex framework for weakly-supervised semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2314–2320 (2016)

    Article  Google Scholar 

  9. Wei, Y., et al.: Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition (2017)

  10. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. ar**v preprint ar**v:1511.07122 (2015)

  11. Chen, L.-C., et al.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)

    Article  Google Scholar 

  12. Andri, R., et al.: YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights. In: 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). 2016. IEEE

  13. Jia, Y., et al.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia (2014)

  14. Qiu, J., et al.: Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2016)

  15. Zhang, X., et al.: High-performance video content recognition with long-term recurrent convolutional network for FPGA. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL). 2017. IEEE

  16. Zhang, X., et al.: Machine learning on FPGAs to face the IoT revolution. In: 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 2017. IEEE

  17. Loy, C.C., et al.: Crowd counting and profiling: Methodology and evaluation. Modeling, simulation and visual analysis of crowds, pp. 347–382. Springer, Berlin (2013)

    Chapter  Google Scholar 

  18. Dollar, P., et al.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2011)

    Article  Google Scholar 

  19. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05). 2005. IEEE.

  20. Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vision 57(2), 137–154 (2004)

    Article  Google Scholar 

  21. Felzenszwalb, P.F., et al.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2009)

    Article  Google Scholar 

  22. Hassaballah, M., Awad, A.I.: Detection and description of image features: an introduction. Image feature detectors and descriptors, pp. 1–8. Springer, Berlin (2016)

    Google Scholar 

  23. Chan, A.B., Vasconcelos, N.: Bayesian Poisson regression for crowd counting. In: 2009 IEEE 12th international conference on computer vision. 2009. IEEE.

  24. Idrees, H., et al.: Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE conference on computer vision and pattern recognition (2013)

  25. Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision. 1999. IEEE.

  26. Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: Advances in neural information processing systems (2010)

  27. Pham, V.-Q., et al.: Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of the IEEE International Conference on Computer Vision (2015)

  28. Hassaballah, M., Awad, A.I.: Deep learning in computer vision: principles and applications. CRC Press, Boca Raton (2020)

    Book  Google Scholar 

  29. Muhammad, K., et al.: Energy-efficient monitoring of fire scenes for intelligent networks. IEEE Netw. 34(3), 108–115 (2020)

    Article  Google Scholar 

  30. Ullah, A., et al.: Action recognition using optimized deep autoencoder and CNN for surveillance data streams of non-stationary environments. Future Gener. Comput. Syst. 96, 386–397 (2019)

    Article  Google Scholar 

  31. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems (2012)

  32. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. ar**v preprint ar**v:1409.1556 (2014)

  33. Yan, L., Zheng, Y., Cao, J.: Few-shot learning for short text classification. Multimedia Tools Appl. 77(22), 29799–29810 (2018)

    Article  Google Scholar 

  34. Hassaballah, M., Hosny, K.M.: Recent advances in computer vision: theories and applications, vol. 804. Springer, Berlin (2018)

    Google Scholar 

  35. Ul Haq, I., et al.: Personalized movie summarization using deep cnn-assisted facial expression recognition. Complexity. 2019 (2019)

  36. Muhammad, K., et al.: Deep learning for Multigrade Brain Tumor classification in smart healthcare systems: a prospective survey. IEEE Trans. Neural Netw. Learn. Syst. (2020)

  37. Ullah, F.U.M., et al.: Violence detection using spatiotemporal features with 3D convolutional neural network. Sensors 19(11), 2472 (2019)

    Article  Google Scholar 

  38. Khan, S.U., et al.: Cover the violence: a novel deep-learning-based approach towards violence-detection in movies. Appl. Sci. 9(22), 4963 (2019)

    Article  Google Scholar 

  39. Walach, E., Wolf, L.: Learning to count with cnn boosting. In: European conference on computer vision. Springer, Berlin (2016)

    Book  Google Scholar 

  40. Shang, C., Ai, H., Bai, B.: End-to-end crowd counting via joint learning local and global count. In: 2016 IEEE International Conference on Image Processing (ICIP). 2016. IEEE

  41. Boominathan, L., Kruthiventi, S.S., Babu, R.V.: Crowdnet: A deep convolutional network for dense crowd counting. In: Proceedings of the 24th ACM international conference on Multimedia (2016)

  42. Marsden, M., et al.: Fully convolutional crowd counting on highly congested scenes. ar**v preprint ar**v:1612.00220 (2016)

  43. Sindagi, V.A., Patel, V.M.: Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). 2017. IEEE.

  44. Zhang, Y., et al.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (2016)

  45. Onoro-Rubio, D., López-Sastre, R.J.: Towards perspective-free object counting with deep learning. In: European Conference on Computer Vision. Springer, Berlin (2016)

    Book  Google Scholar 

  46. Shi, X., et al.: A real-time deep network for crowd counting. ar**v preprint ar**v:2002.06515, (2020)

  47. Wang, N., et al.: A light tracker for online multiple pedestrian tracking. J. Real-Time Image Process. 1–17

  48. Balasundaram, A., Chellappan, C.: An intelligent video analytics model for abnormal event detection in online surveillance video. J. Real-Time Image Process. 1–16 (2018)

  49. Shallari, I., Krug, S., O’Nils, M.: Communication and computation inter-effects in people counting using intelligence partitioning. J. Real-Time Image Process. 1–14 (2020)

  50. Migniot, C., Ababsa, F.: Hybrid 3D–2D human tracking in a top view. J. Real-Time Image Proc. 11(4), 769–784 (2016)

    Article  Google Scholar 

  51. Poiesi, F., Cavallaro, A.: Predicting and recognizing human interactions in public spaces. J. Real-Time Image Proc. 10(4), 785–803 (2015)

    Article  Google Scholar 

  52. Nam, Y., Hong, S.: Real-time abnormal situation detection based on particle advection in crowded scenes. J. Real-Time Image Proc. 10(4), 771–784 (2015)

    Article  Google Scholar 

  53. Bahri, H., et al.: Real-time moving human detection using HOG and Fourier descriptor based on CUDA implementation. J. Real-Time Image Process. 1–16 (2019)

  54. Chun, S., Lee, C.-S., Jang, J.-S.: Real-time smart lighting control using human motion tracking from depth camera. J. Real-Time Image Proc. 10(4), 805–820 (2015)

    Article  Google Scholar 

  55. Lotfi, M., Motamedi, S.A., Sharifian, S.: Time-based feedback-control framework for real-time video surveillance systems with utilization control. J. Real-Time Image Proc. 16(4), 1301–1316 (2019)

    Article  Google Scholar 

  56. Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. IEEE

  57. Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid cnns. In: Proceedings of the IEEE International Conference on Computer Vision (2017)

  58. Chan, A.B., Liang, Z.-S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: Counting people without people models or tracking. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition. 2008. IEEE

  59. Sajjad, M., et al.: Multi-grade brain tumor classification using deep CNN with extensive data augmentation. J. Comput. Sci. 30, 174–182 (2019)

    Article  Google Scholar 

  60. Howard, A.G., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. ar**v preprint ar**v:1704.04861 (2017)

  61. Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (2017)

  62. Chen, L.-C., et al.: Rethinking atrous convolution for semantic image segmentation. ar**v preprint ar**v:1706.05587 (2017)

  63. Zeiler, M.D., et al.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on computer vision and pattern recognition. 2010. IEEE

  64. Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision (2015)

  65. Lu, Z., et al.: The Classification of Gliomas Based on a Pyramid Dilated Convolution ResNet Model. Pattern Recogn. Lett. (2020)

  66. Tota, K., Idrees, H.: Counting in dense crowds using deep features. In: Proc. CRCV. (2015)

Download references

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. 2019R1A2B5B01070067).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sung Wook Baik.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khan, N., Ullah, A., Haq, I.U. et al. SD-Net: Understanding overcrowded scenes in real-time via an efficient dilated convolutional neural network. J Real-Time Image Proc 18, 1729–1743 (2021). https://doi.org/10.1007/s11554-020-01020-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-020-01020-8

Keywords

Navigation