Log in

Accelerating temporal action proposal generation via high performance computing

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Temporal action proposal generation aims to output the starting and ending times of each potential action for long videos and often suffers from high computation cost. To address the issue, we propose a new temporal convolution network called Multipath Temporal ConvNet (MTCN). In our work, one novel high performance ring parallel architecture based is further introduced into temporal action proposal generation in order to respond to the requirements of large memory occupation and a large number of videos. Remarkably, the total data transmission is reduced by adding a connection between multiple-computing load in the newly developed architecture. Compared to the traditional Parameter Server architecture, our parallel architecture has higher efficiency on temporal action detection tasks with multiple GPUs. We conduct experiments on ActivityNet-1.3 and THUMOS14, where our method outperforms-other state-of-art temporal action detection methods with high recall and high temporal precision. In addition, a time metric is further proposed here to evaluate the speed performancein the distributed training process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Muhammad K, Hamza R, Ahmad J, Lloret J, Wang H, Baik S. Secure surveillance framework for IoT systems using probabilistic image encryption. IEEE Transactions on Industrial Informatics, 2018, 14(8): 3679–3689

    Article  Google Scholar 

  2. Sajjad M, Haq I U, Lloret J, Ding W, Muhammad K. Robust image hashing based efficient authentication for smart industrial environment. IEEE Transactions on Industrial Informatics, 2019, 15(12): 6541–6550

    Article  Google Scholar 

  3. Wang T, Qiao M, Lin Z, Li C, Snoussi H, Liu Z, Choi C. Generative neural networks for anomaly detection in crowded scenes. IEEE Transactions on Information Forensics and Security, 2018, 14(5): 1390–1399

    Article  Google Scholar 

  4. Muhammad K, Khan S, Palade V, Mehmood I, De Albuquerque V H. Edge intelligence-assisted smoke detection in foggy surveillance environments. IEEE Transactions on Industrial Informatics, 2019, 16(2): 1067–1075

    Article  Google Scholar 

  5. Wang T, Miao Z, Chen Y, Zhou Y, Shan G, Snoussi H. Aed-net: an abnormal event detection network. Engineering, 2019, 5(5): 930–939

    Article  Google Scholar 

  6. Caba Heilbron F, Escorcia V, Ghanem B, Carlos Niebles J. Activitynet: a large-scale video benchmark for human activity understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 961–970

  7. Jiang Y G, Liu J, Zamir A. R, Toderici G, Laptev I, Shah M, Sukthankar R. Thumos challenge: action recognition with a large number of classes. 2014

  8. Lin T, Zhao X, Su H, Wang C, Yang M. BSN: boundary sensitive network for temporal action proposal generation. In: Proceedings of the European Conference on Computer Vision. 2018, 3–19

  9. Buch S, Escorcia V, Shen C, Ghanem B, Carlos Niebles J. SST: singlestream temporal action proposals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2911–2920

  10. Caba Heilbron F, Carlos Niebles J, Ghanem B. Fast temporal activity proposals for efficient detection of human actions in untrimmed videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 1914–1923

  11. Escorcia V, Heilbron F. C, Niebles J. C, Ghanem B. Daps: deep action proposals for action understanding. In: Proceedings of the European Conference on Computer Vision. 2016, 768–784

  12. Shou Z, Wang D, Chang SF. Temporal action localization in untrimmed videos via multi-stage cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 1049–1058

  13. Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Senior A, Tucker P, Yang K, Le Q V, et al. Large scale distributed deep networks. In: Proceedings of the Advances in Neural Information Processing Systems. 2012, 1223–1231

  14. Karaman S, Seidenari L, Del Bimbo A. Fast saliency based pooling of fisher encoded dense trajectories. In: Proceedings of the European Conference on Computer Vision THUMOS Workshop. 2014

  15. Wang L, Qiao Y, Tang X. Action recognition and detection by combining motion and appearance features. THUMOS14 Action Recognition Challenge, 2014, 1(2): 2

    Google Scholar 

  16. Wang T, Chen Y, Lin Z, Zhu A, Li Y, Snoussi H, Wang H. Recapnet: action proposal generation mimicking human cognitive process. IEEE Transactions on Cybernetics, 2020, DOI: https://doi.org/10.1109/TCYB.2020.2965196

  17. Gao J, Yang Z, Chen K, Sun C, Nevatia R. Turn tap: temporal unit regression network for temporal action proposals. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 3628–3636

  18. Zhao Y, **ong Y, Wang L, Wu Z, Tang X, Lin D. Temporal action detection with structured segment networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 2914–2923

  19. Jian M, Lam K M, Dong J, Shen L. Visual-patch-attention-aware saliency detection. IEEE Transactions on Cybernetics, 2014, 45(8): 1575–1586

    Article  Google Scholar 

  20. Jian M, Qi Q, Dong J, Yin Y, Lam K M. Integrating qdwd with pattern distinctness and local contrast for underwater saliency detection. Journal of Visual Communication and Image Representation, 2018, 53: 31–41

    Article  Google Scholar 

  21. Jian M, Qi Q, Yu H, Dong J, Cui C, Nie X, Zhang H, Yin Y, Lam K M. The extended marine underwater environment database and baseline evaluations. Applied Soft Computing, 2019, 80: 425–437

    Article  Google Scholar 

  22. Wang T, Chen Y, Lv H, Teng J, Snoussi H, Tao F. Online detection of action start via soft computing for smart city. IEEE Transactions on Industrial Informatics, 2020, 17(1): 524–533

    Article  Google Scholar 

  23. Wang H, Kläser A, Schmid C, Liu C L. Action recognition by dense trajectories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2011, 3169–3176

  24. Feichtenhofer C, Pinz A, Zisserman A. Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 1933–1941

  25. Wang L, **ong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L. Temporal segment networks: towards good practices for deep action recognition. In: Proceedings of the European Conference on Computer Vision. 2016, 20–36

  26. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M. Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 4489–4497

  27. Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 2008, 51(1): 107–113

    Article  Google Scholar 

  28. Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein J M. Distributed graphlab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 2012, 5(8): 716–727

    Article  Google Scholar 

  29. Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 7132–7141

  30. Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014, 568–576

  31. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado G S, Davis A, Dean J, Devin M, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. 2016, ar**v preprint ar**v: 1603.04467

  32. Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. 2015, ar**v preprint ar**v: 1502.03167

  33. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770–778

  34. Dai X, Singh B, Zhang G, Davis L. S, Qiu Chen Y. Temporal context network for activity localization in videos. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 5793–5802

  35. Ghanem B, Niebles J C, Snoek C, Heilbron F C, Alwassel H, Khrisna R, Escorcia V, Hata K, Buch S. Activitynet challenge 2017 summary. 2017, ar**v preprint ar**v: 1710.08011

Download references

Acknowledgements

This work was partially supported by the National Key Research and Development Program of China (2016YFE0204200), the National Natural Science Foundation of China (Grant Nos. 61972016, 62032016), Bei**g Natural Science Foundation (L191007), the Fundamental Research Funds for the Central Universities (YWF-21-BJ-J-313 and YWF-20-BJ-J-612), Open Research Fund of Digital Fujian Environment Monitoring Internet of Things Laboratory Foundation (202004). The experimental platform is provided by Marc Casas at the Barcelona Supercomputing Center (BSC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guangcun Shan.

Additional information

Tian Wang received the BE degree and MS degree from **’an Jiaotong University, China in 2007 and 2010, respectively. He received his PhD degree from University of Technology of Troyes, France in 2014. He is an associate professor at the Institute of Artificial Intelligence, Beihang University, China. His research interests include artificial intelligence, machine learning, computer vision and pattern recognition.

Shiye Lei received the BS degree from Beihang University, China. He is currently pursing the MS degree with School of Computer Science, University of Sydney, Australia. His current research interests include machine learning and theinterpretability of neural networks.

Youyou Jiang received the BS degree and MS degree from **’an Jiaotong University and Tsinghua-University, China, respectively. His current research interests include machine learning and its applications.

Choi Chang received BS, MS and PhD degrees in Computer Engineering from Chosun University, Korea in 2005, 2007, and 2012, respectively. After that, he worked at the same university as a research professor for several years, and then he has moved to Gachon University since 2020. He was awarded the academic awards from the graduate school of Chosun University in 2012. He also received a Korean government scholarship for graduate students (PhD course) in 2008. His research interests include intelligent information processing, semantic web, smart IoT system and intelligent system security.

Hichem Snoussi received his PhD degrees from the University of Paris-Sud, France in 2003. Since 2010, he has been a full professor at the University of Technology of Troyes, France. His research interests include signal processing, computer vision and machine learning.

Guangcun Shan received the PhD degree from City University of Hong Kong, China in 2013, and the BE degree from **’an Jiaotong University, China in 2004, respectively. He is a full professor under the support of National Talent Program at Beihang University, China. His research interests include the machine learningalgorithm, first-principle calculation of functional materials, the model design and fabrication of MEMS sensors and 2D material-based wearable flexible electronics.

Yao Fu received the PhD degree from Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, China in 2012. She is currently an Associate Professor with Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, China. Her research interests include remote sensing optical imaging and image processing.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, T., Lei, S., Jiang, Y. et al. Accelerating temporal action proposal generation via high performance computing. Front. Comput. Sci. 16, 164317 (2022). https://doi.org/10.1007/s11704-021-0173-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-021-0173-7

Keywords

Navigation