Abstract
For solving the data imbalance, dimensionality reduction, and optimizing the training time research problem, this research developed a method for producing an optimized number of prototypes using the clustering algorithms to provide training data to a logistic regression classifier for video retrieval. The optimization is based on the number of minimum class samples. This study uses clustering-based sampling to extract prototypes, which results in fewer training samples and shorter training times. The classified value for each query video is compared to the prototype feature’s classified value for each class in the database. According to the performance of the Clustered Prototype on pre-trained 3D Resnet features learned classifier, the top 1 accuracy of retrieval results concerning L2 distance (Euclidean distance) for both the datasets UCF101 and HMDB51 has improved above novel methodologies. The top 1 accuracy of the UCF101 dataset has been observed as 0.844 and the HMDB51 dataset as 0.5620. There is an improvement of 5.1% in the top 1 mean average precision of retrieval in the UCF101 dataset and 1.9% in the HMDB51 dataset than the original features without machine learning.
Similar content being viewed by others
Availability of data and materials
The datasets analysed during the current study are available in the repository with weblink [https://www.crcv.ucf.edu/data/UCF101/ UCF101.rar]. [http://serre-lab.clps.brown.edu/wpcontent/uploads/2013/10/hmdb51org.rar].
References
Chaurasia RK, Jaiswal UC (2023) Spatiotemporal based video anomaly detection using deep neural networks. Int J Inf Technol 15(3):1569–1581. https://doi.org/10.1007/s41870-023-01193-y
Yoon H, Han J-H (2022) Content-based video retrieval with prototypes of deep features. IEEE Access 10:30730–30742. https://doi.org/10.1109/ACCESS.2022.3160214
Kordopatis-Zilos G, Papadopoulos S, Patras I, Kompatsiaris Y (2017) Near-duplicate video retrieval by aggregating intermediate CNN layers. In: Amsaleg L, Guðmundsson G, Gurrin C, Jónsson B, Shin’ichi S (eds) MultiMedia modeling. Springer International Publishing, Cham, pp 251–263. https://doi.org/10.1007/978-3-319-51811-4_21
Kumar V, Tripathi V, Pant B (2019) Learning compact spatio-temporal features for fast content based video retrieval. Int J Innov Technol Explor Eng 2(9):2404–2409. https://doi.org/10.35940/ijitee.b7847.129219
Rajender N, Gopalachari MV (2023) An efficient dimensionality reduction based on adaptive-gsm and transformer assisted classification for high dimensional data. Int J Inf Technol. https://doi.org/10.1007/s41870-023-01552-9
Fuangkhon P (2023) Interclass boundary preservation (ibp): a data reduction algorithm. Int J Inf Technol 15(5):2333–2347. https://doi.org/10.1007/s41870-023-01275-x
Patil NS, Sawarkar SD (2019) Semantic concept detection in video using hybrid model of cnn and svm classifiers. Int J Image Process (IJIP) 13(2):13
Zhou Z, Chen J, Yang C-N, Sun X (2019) Video copy detection using spatio-temporal CNN features. IEEE Access 7:100658–100665. https://doi.org/10.1109/access.2019.2930173
Khan MN, Alam A, Lee Y-K (2020) FALKON: large-scale content-based video retrieval utilizing deep-features and distributed in-memory computing. 2020 IEEE International Conference on Big Data and Smart Computing (BigComp). https://doi.org/10.1109/bigcomp48618.2020.0-102
Zhang L et al (2017) Exploiting spatial-temporal context for trajectory based action video retrieval. Multimed Tools Appl 77(2):2057–2081. https://doi.org/10.1007/s11042-017-4353-2
Saoudi EM, Jai-Andaloussi S (2021) A distributed content-based video retrieval system for large datasets. J Big Data. https://doi.org/10.1186/s40537-021-00479-x
Chou C-L, Chen H-T, Lee S-Y (2015) Pattern-based near-duplicate video retrieval and localization on web-scale videos. IEEE Trans Multimed 17(3):382–395. https://doi.org/10.1109/tmm.2015.2391674
Qi M, Qin J, Yang Y, Wang Y, Luo J (2021) Semantics-aware spatial-temporal binaries for cross-modal video retrieval. IEEE Trans Image Process 30:2989–3004. https://doi.org/10.1109/tip.2020.3048680
Chen H et al (2021) A supervised video hashing method based on a deep 3d convolutional neural network for large-scale video retrieval. Sensors 21(9):3094. https://doi.org/10.3390/s21093094
**an Y et al (2022) Generalized few-shot video classification with video retrieval and feature generation. IEEE Trans Pattern Anal Mach Intell 44(12):8949–8961. https://doi.org/10.1109/tpami.2021.3120550
Pouyanfar S, Chen S-C, Shyu M-L (2018) Deep spatio-temporal representation learning for multi-class imbalanced data classification. 2018 IEEE International Conference on Information Reuse and Integration (IRI). https://doi.org/10.1109/iri.2018.00064
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357. https://doi.org/10.1613/jair.953
Two modifications of CNN (1976) IEEE Trans Syst Man Cybern SMC-6(11):769–772. https://doi.org/10.1109/tsmc.1976.4309452
Batista GEAPA, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newslett 6(1):20–29. https://doi.org/10.1145/1007730.1007735
Han H, Wang W-Y, Mao B-H (2005) In: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Advances in Intelligent Computing, pp 878–887. https://doi.org/10.1007/11538059_91
Banerjee A, Kumar E, Ravinder M (2022) Transformed deep spatio temporalfeatures with fused distance for efficient video retrieval. pp 1–5. https://doi.org/10.1109/aist55798.2022.10064821
Raschka S (2015) Python machine learning. Packt Publishing, Birmingham, UK
Hu Y, Lu X (2018) Learning spatial-temporal features for video copy detection by the combination of CNN and RNN. J Vis Commun Image Represent 55:21–29. https://doi.org/10.1016/j.jvcir.2018.05.013
Wang T, Chen Y, Zhang M, Chen J, Snoussi H (2017) Internal transfer learning for improving performance in human action recognition for small datasets. IEEE Access 5:17627–17633. https://doi.org/10.1109/access.2017.2746095
Lin W-C, Tsai C-F, Hu Y-H, Jhang J-S (2017) Clustering-based undersampling in class-imbalanced data. Inf Sci 409–410:17–26. https://doi.org/10.1016/j.ins.2017.05.008
Ahmad H, Kasasbeh B, Aldabaybah B, Rawashdeh E (2022) Class balancing framework for credit card fraud detection based on clustering and similarity-based selection (sbs). Int J Inf Technol 15(1):325–333. https://doi.org/10.1007/s41870-022-00987-w
Soomro K, Zamir AR, Shah M (2012) Ucf101: a dataset of 101 human actions classes from videos in the wild. https://arxiv.org/abs/1212.0402
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. 2011 International Conference on Computer Vision. https://doi.org/10.1109/iccv.2011.6126543
Choi J, Wang Z, Lee S-C, Jeon WJ (2013) A spatio-temporal pyramid matching for video retrieval. Comput Vis Image Underst 117(6):660–669. https://doi.org/10.1016/j.cviu.2013.02.003
Banerjee A, Kumar E, Ravinder M (2023) Conditional deep clustering based transformed spatio-temporal features and fused distance for efficient video retrieval. Int J Inf Technol 15(5):2349–2355. https://doi.org/10.1007/s41870-023-01327-2
Funding
The authors did not receive financial support from any organization for the submitted work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical approval
This article does not contain any studies involving human participants performed by any of the authors. This article also does not contain any studies involving animals performed by any of the authors.
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Banerjee, A., Kumar, E. & Ravinder, M. Learning clustered deep spatio-temporal prototypes using softmax regression for video information systems. Int. j. inf. tecnol. 16, 3085–3091 (2024). https://doi.org/10.1007/s41870-024-01826-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41870-024-01826-w