Log in

Learning clustered deep spatio-temporal prototypes using softmax regression for video information systems

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

For solving the data imbalance, dimensionality reduction, and optimizing the training time research problem, this research developed a method for producing an optimized number of prototypes using the clustering algorithms to provide training data to a logistic regression classifier for video retrieval. The optimization is based on the number of minimum class samples. This study uses clustering-based sampling to extract prototypes, which results in fewer training samples and shorter training times. The classified value for each query video is compared to the prototype feature’s classified value for each class in the database. According to the performance of the Clustered Prototype on pre-trained 3D Resnet features learned classifier, the top 1 accuracy of retrieval results concerning L2 distance (Euclidean distance) for both the datasets UCF101 and HMDB51 has improved above novel methodologies. The top 1 accuracy of the UCF101 dataset has been observed as 0.844 and the HMDB51 dataset as 0.5620. There is an improvement of 5.1% in the top 1 mean average precision of retrieval in the UCF101 dataset and 1.9% in the HMDB51 dataset than the original features without machine learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Availability of data and materials

The datasets analysed during the current study are available in the repository with weblink [https://www.crcv.ucf.edu/data/UCF101/ UCF101.rar]. [http://serre-lab.clps.brown.edu/wpcontent/uploads/2013/10/hmdb51org.rar].

References

  1. Chaurasia RK, Jaiswal UC (2023) Spatiotemporal based video anomaly detection using deep neural networks. Int J Inf Technol 15(3):1569–1581. https://doi.org/10.1007/s41870-023-01193-y

    Article  Google Scholar 

  2. Yoon H, Han J-H (2022) Content-based video retrieval with prototypes of deep features. IEEE Access 10:30730–30742. https://doi.org/10.1109/ACCESS.2022.3160214

    Article  Google Scholar 

  3. Kordopatis-Zilos G, Papadopoulos S, Patras I, Kompatsiaris Y (2017) Near-duplicate video retrieval by aggregating intermediate CNN layers. In: Amsaleg L, Guðmundsson G, Gurrin C, Jónsson B, Shin’ichi S (eds) MultiMedia modeling. Springer International Publishing, Cham, pp 251–263. https://doi.org/10.1007/978-3-319-51811-4_21

    Chapter  Google Scholar 

  4. Kumar V, Tripathi V, Pant B (2019) Learning compact spatio-temporal features for fast content based video retrieval. Int J Innov Technol Explor Eng 2(9):2404–2409. https://doi.org/10.35940/ijitee.b7847.129219

    Article  Google Scholar 

  5. Rajender N, Gopalachari MV (2023) An efficient dimensionality reduction based on adaptive-gsm and transformer assisted classification for high dimensional data. Int J Inf Technol. https://doi.org/10.1007/s41870-023-01552-9

    Article  Google Scholar 

  6. Fuangkhon P (2023) Interclass boundary preservation (ibp): a data reduction algorithm. Int J Inf Technol 15(5):2333–2347. https://doi.org/10.1007/s41870-023-01275-x

    Article  Google Scholar 

  7. Patil NS, Sawarkar SD (2019) Semantic concept detection in video using hybrid model of cnn and svm classifiers. Int J Image Process (IJIP) 13(2):13

    Google Scholar 

  8. Zhou Z, Chen J, Yang C-N, Sun X (2019) Video copy detection using spatio-temporal CNN features. IEEE Access 7:100658–100665. https://doi.org/10.1109/access.2019.2930173

    Article  Google Scholar 

  9. Khan MN, Alam A, Lee Y-K (2020) FALKON: large-scale content-based video retrieval utilizing deep-features and distributed in-memory computing. 2020 IEEE International Conference on Big Data and Smart Computing (BigComp). https://doi.org/10.1109/bigcomp48618.2020.0-102

  10. Zhang L et al (2017) Exploiting spatial-temporal context for trajectory based action video retrieval. Multimed Tools Appl 77(2):2057–2081. https://doi.org/10.1007/s11042-017-4353-2

    Article  Google Scholar 

  11. Saoudi EM, Jai-Andaloussi S (2021) A distributed content-based video retrieval system for large datasets. J Big Data. https://doi.org/10.1186/s40537-021-00479-x

    Article  Google Scholar 

  12. Chou C-L, Chen H-T, Lee S-Y (2015) Pattern-based near-duplicate video retrieval and localization on web-scale videos. IEEE Trans Multimed 17(3):382–395. https://doi.org/10.1109/tmm.2015.2391674

    Article  Google Scholar 

  13. Qi M, Qin J, Yang Y, Wang Y, Luo J (2021) Semantics-aware spatial-temporal binaries for cross-modal video retrieval. IEEE Trans Image Process 30:2989–3004. https://doi.org/10.1109/tip.2020.3048680

    Article  MathSciNet  Google Scholar 

  14. Chen H et al (2021) A supervised video hashing method based on a deep 3d convolutional neural network for large-scale video retrieval. Sensors 21(9):3094. https://doi.org/10.3390/s21093094

    Article  Google Scholar 

  15. **an Y et al (2022) Generalized few-shot video classification with video retrieval and feature generation. IEEE Trans Pattern Anal Mach Intell 44(12):8949–8961. https://doi.org/10.1109/tpami.2021.3120550

    Article  Google Scholar 

  16. Pouyanfar S, Chen S-C, Shyu M-L (2018) Deep spatio-temporal representation learning for multi-class imbalanced data classification. 2018 IEEE International Conference on Information Reuse and Integration (IRI). https://doi.org/10.1109/iri.2018.00064

  17. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357. https://doi.org/10.1613/jair.953

    Article  Google Scholar 

  18. Two modifications of CNN (1976) IEEE Trans Syst Man Cybern SMC-6(11):769–772. https://doi.org/10.1109/tsmc.1976.4309452

  19. Batista GEAPA, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newslett 6(1):20–29. https://doi.org/10.1145/1007730.1007735

    Article  Google Scholar 

  20. Han H, Wang W-Y, Mao B-H (2005) In: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Advances in Intelligent Computing, pp 878–887. https://doi.org/10.1007/11538059_91

  21. Banerjee A, Kumar E, Ravinder M (2022) Transformed deep spatio temporalfeatures with fused distance for efficient video retrieval. pp 1–5. https://doi.org/10.1109/aist55798.2022.10064821

    Article  Google Scholar 

  22. Raschka S (2015) Python machine learning. Packt Publishing, Birmingham, UK

    Google Scholar 

  23. Hu Y, Lu X (2018) Learning spatial-temporal features for video copy detection by the combination of CNN and RNN. J Vis Commun Image Represent 55:21–29. https://doi.org/10.1016/j.jvcir.2018.05.013

    Article  Google Scholar 

  24. Wang T, Chen Y, Zhang M, Chen J, Snoussi H (2017) Internal transfer learning for improving performance in human action recognition for small datasets. IEEE Access 5:17627–17633. https://doi.org/10.1109/access.2017.2746095

    Article  Google Scholar 

  25. Lin W-C, Tsai C-F, Hu Y-H, Jhang J-S (2017) Clustering-based undersampling in class-imbalanced data. Inf Sci 409–410:17–26. https://doi.org/10.1016/j.ins.2017.05.008

    Article  Google Scholar 

  26. Ahmad H, Kasasbeh B, Aldabaybah B, Rawashdeh E (2022) Class balancing framework for credit card fraud detection based on clustering and similarity-based selection (sbs). Int J Inf Technol 15(1):325–333. https://doi.org/10.1007/s41870-022-00987-w

    Article  Google Scholar 

  27. Soomro K, Zamir AR, Shah M (2012) Ucf101: a dataset of 101 human actions classes from videos in the wild. https://arxiv.org/abs/1212.0402

  28. Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. 2011 International Conference on Computer Vision. https://doi.org/10.1109/iccv.2011.6126543

  29. Choi J, Wang Z, Lee S-C, Jeon WJ (2013) A spatio-temporal pyramid matching for video retrieval. Comput Vis Image Underst 117(6):660–669. https://doi.org/10.1016/j.cviu.2013.02.003

    Article  Google Scholar 

  30. Banerjee A, Kumar E, Ravinder M (2023) Conditional deep clustering based transformed spatio-temporal features and fused distance for efficient video retrieval. Int J Inf Technol 15(5):2349–2355. https://doi.org/10.1007/s41870-023-01327-2

Download references

Funding

The authors did not receive financial support from any organization for the submitted work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alina Banerjee.

Ethics declarations

Ethical approval

This article does not contain any studies involving human participants performed by any of the authors. This article also does not contain any studies involving animals performed by any of the authors.

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Banerjee, A., Kumar, E. & Ravinder, M. Learning clustered deep spatio-temporal prototypes using softmax regression for video information systems. Int. j. inf. tecnol. 16, 3085–3091 (2024). https://doi.org/10.1007/s41870-024-01826-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41870-024-01826-w

Keywords

Navigation