Spatiotemporal Object Detection and Activity Recognition

  • Chapter
  • First Online:
Spatiotemporal Data Analytics and Modeling

Part of the book series: Big Data Management ((BIGDM))

  • 122 Accesses

Abstract

Spatiotemporal object detection and activity recognition are essential components in the advancement of computer vision, with broad applications spanning surveillance, autonomous driving, and smart stores. This chapter offers a comprehensive overview of the techniques and applications associated with these concepts. Beginning with an introduction to the fundamental principles of object detection and activity recognition, we discuss the challenges and limitations posed by existing methods. The chapter progresses to explore spatiotemporal object detection and activity recognition, which entails capturing spatial and temporal information of moving objects in video data. A hierarchical model for spatiotemporal object detection and activity recognition is proposed, designed to maintain spatial and temporal connectivity across frames. Additionally, the chapter outlines various metrics for evaluating the performance of object detection and activity recognition models, ensuring their accuracy and effectiveness in real-world applications. Finally, we underscore the significance of spatiotemporal object detection and activity recognition in diverse fields such as surveillance, autonomous driving, and smart stores, emphasizing the potential for further research and development in these areas. In summary, this chapter provides a thorough examination of spatiotemporal object detection and activity recognition, from the foundational concepts to the latest techniques and applications. By presenting a hierarchical model and performance evaluation metrics, the chapter serves as a valuable resource for researchers and practitioners seeking to harness the power of computer vision in a variety of domains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Pei, T., Huang, Q., Wang, X., Chen, X., Liu, Y., Song, C., … Zhou, C. (2021). Big geodata aggregation: Connotation, classification, and framework. National Remote Sensing Bulletin, 25(11), 2153–2162. doi: https://doi.org/10.11834/jrs.20210480

  2. Liu, Y., & **g, H. (2022). A Sports Video Behavior Recognition Using Local Spatiotemporal Patterns. Mobile Information Systems, 2022. doi: https://doi.org/10.1155/2022/4805993

  3. Wang, K., Li, X., Yang, J., Wu, J., & Li, R. (2021). Temporal action detection based on two-stream You Only Look Once network for elderly care service robot. International Journal of Advanced Robotic Systems, 18(4). doi: https://doi.org/10.1177/17298814211038342

  4. Clapham, M., Miller, E., Nguyen, M., & Darimont, C. T. (2020). Automated facial recognition for wildlife that lack unique markings: A deep learning approach for brown bears. Ecology and Evolution, 10(23), 12883–12892. doi: https://doi.org/10.1002/ece3.6840

  5. Akilan, T. (2018). Video foreground localization from traditional methods to deep learning (Doctoral dissertation, University of Windsor (Canada)).

    Google Scholar 

  6. Ramesh, S., Dall’Alba, D., Gonzalez, C., Yu, T., Mascagni, P., Mutter, D., Padoy, N. (2023). TRandAugment: temporal random augmentation strategy for surgical activity recognition from videos. International Journal of Computer Assisted Radiology and Surgery. doi: https://doi.org/10.1007/s11548-023-02864-8

  7. Cardoso, D. B., Campos, L. C. B., & Nascimento, E. R. (2022). An Action Recognition Approach with Context and Multiscale Motion Awareness. In Proceedings - 2022 35th Conference on Graphics, Patterns, and Images, SIBGRAPI 2022 (pp. 73–78). Institute of Electrical and Electronics Engineers Inc. doi: https://doi.org/10.1109/SIBGRAPI55357.2022.9991807

  8. SankaranNampoothiri, S., & Anoop BK (2014). Review on Vision based Human Activity Analysis. International Journal of Computer Applications, 99(2), 9–14. doi: https://doi.org/10.5120/17343-6240

  9. Aakur, S., Sawyer, D., Balazia, M., & Sarkar, S. (2020). An examination of proposal-based approaches to fine-grained activity detection in untrimmed surveillance videos. In 2018 TREC Video Retrieval Evaluation, TRECVID 2018. National Institute of Standards and Technology (NIST).

    Google Scholar 

  10. Sun, J., Wu, X., Yan, S., Cheong, L. F., Chua, T. S., & Li, J. (2009). Hierarchical spatio-temporal context modeling for action recognition. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009 (pp. 2004–2011). IEEE Computer Society. doi: https://doi.org/10.1109/CVPRW.2009.5206721

  11. Wang, J., Chen, Z., & Wu, Y. (2011, June). Action recognition with multiscale spatio-temporal contexts. In CVPR 2011 (pp. 3185-3192). IEEE.

    Google Scholar 

  12. Ahsan, U., Madhok, R., & Essa, I. (2019, January). Video jigsaw: Unsupervised learning of spatiotemporal context for video action recognition. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 179-189). IEEE.

    Google Scholar 

  13. Liu, L., Shao, L., Li, X., & Lu, K. (2016). Learning spatio-temporal representations for action recognition: A genetic programming approach. IEEE Transactions on Cybernetics, 46(1), 158–170. doi: https://doi.org/10.1109/TCYB.2015.2399172.

  14. Haroon Idrees, Khurram Soomro and Mubarak Shah, Detecting Humans in Dense Crowds using Locally-Consistent Scale Prior and Global Occlusion Reasoning, Pattern Analysis and Machine Intelligence (PAMI), IEEE Transactions, 2015.

  15. Yu, T. W., Sarwar, M. A., Daraghmi, Y. A., Cheng, S. H., Ik, T. U., & Li, Y. L. (2022). Spatiotemporal Activity Semantics Understanding Based on Foreground Object Segmentation: iCounter Scenario. IEEE Access, 10, 57748–57758. doi: https://doi.org/10.1109/ACCESS.2022.3178609

    Article  Google Scholar 

  16. **a, L., & Aggarwal, J. K. (2013). Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2834-2841).

    Google Scholar 

  17. Song, S., Lan, C., ** in Classifying Proportional Data," in IEEE Access, vol. 9, pp. 3712-3724, 2021, doi: https://doi.org/10.1109/ACCESS.2020.3047536.

  18. Liz Oz, Always AI (2022), 17 interesting applications of Object Detection for businesses https://alwaysai.co/blog/object-detection-for-businesses June 4, 12.30PM PST

  19. Abdellah Chehri, Hussein T. Mouftah, Autonomous vehicles in the sustainable cities, the beginning of a green adventure, Sustainable Cities and Society, Vol 51, 2019, 101751, ISSN 2210-6707, doi: https://doi.org/10.1016/j.scs.2019.101751.

  20. Torrens PM. Smart and Sentient Retail High Streets. Smart Cities. 2022; 5(4):1670-1720. doi: https://doi.org/10.3390/smartcities5040085.

  21. Lee J, Ahn B. Real-Time Human Action Recognition with a Low-Cost RGB Camera and Mobile Robot Platform. Sensors. 2020; 20(10):2886. doi: https://doi.org/10.3390/s20102886

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vimal Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kumar, V., Jain, S., Lillis, D. (2024). Spatiotemporal Object Detection and Activity Recognition. In: A, J., Abimannan, S., El-Alfy, ES.M., Chang, YS. (eds) Spatiotemporal Data Analytics and Modeling. Big Data Management. Springer, Singapore. https://doi.org/10.1007/978-981-99-9651-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-9651-3_6

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-9650-6

  • Online ISBN: 978-981-99-9651-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation