Learning clustered deep spatio-temporal prototypes using softmax regression for video information systems

Banerjee, Alina; Kumar, Ela; Ravinder, M.

doi:10.1007/s41870-024-01826-w

Learning clustered deep spatio-temporal prototypes using softmax regression for video information systems

Original Research
Published: 29 March 2024

Volume 16, pages 3085–3091, (2024)
Cite this article

International Journal of Information Technology Aims and scope Submit manuscript

48 Accesses
Explore all metrics

Abstract

For solving the data imbalance, dimensionality reduction, and optimizing the training time research problem, this research developed a method for producing an optimized number of prototypes using the clustering algorithms to provide training data to a logistic regression classifier for video retrieval. The optimization is based on the number of minimum class samples. This study uses clustering-based sampling to extract prototypes, which results in fewer training samples and shorter training times. The classified value for each query video is compared to the prototype feature’s classified value for each class in the database. According to the performance of the Clustered Prototype on pre-trained 3D Resnet features learned classifier, the top 1 accuracy of retrieval results concerning L2 distance (Euclidean distance) for both the datasets UCF101 and HMDB51 has improved above novel methodologies. The top 1 accuracy of the UCF101 dataset has been observed as 0.844 and the HMDB51 dataset as 0.5620. There is an improvement of 5.1% in the top 1 mean average precision of retrieval in the UCF101 dataset and 1.9% in the HMDB51 dataset than the original features without machine learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

Visualizing and Understanding Convolutional Networks

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

Availability of data and materials

The datasets analysed during the current study are available in the repository with weblink [https://www.crcv.ucf.edu/data/UCF101/ UCF101.rar]. [http://serre-lab.clps.brown.edu/wpcontent/uploads/2013/10/hmdb51org.rar].

References

Chaurasia RK, Jaiswal UC (2023) Spatiotemporal based video anomaly detection using deep neural networks. Int J Inf Technol 15(3):1569–1581. https://doi.org/10.1007/s41870-023-01193-y
Article Google Scholar
Yoon H, Han J-H (2022) Content-based video retrieval with prototypes of deep features. IEEE Access 10:30730–30742. https://doi.org/10.1109/ACCESS.2022.3160214
Article Google Scholar
Kordopatis-Zilos G, Papadopoulos S, Patras I, Kompatsiaris Y (2017) Near-duplicate video retrieval by aggregating intermediate CNN layers. In: Amsaleg L, Guðmundsson G, Gurrin C, Jónsson B, Shin’ichi S (eds) MultiMedia modeling. Springer International Publishing, Cham, pp 251–263. https://doi.org/10.1007/978-3-319-51811-4_21
Chapter Google Scholar
Kumar V, Tripathi V, Pant B (2019) Learning compact spatio-temporal features for fast content based video retrieval. Int J Innov Technol Explor Eng 2(9):2404–2409. https://doi.org/10.35940/ijitee.b7847.129219
Article Google Scholar
Rajender N, Gopalachari MV (2023) An efficient dimensionality reduction based on adaptive-gsm and transformer assisted classification for high dimensional data. Int J Inf Technol. https://doi.org/10.1007/s41870-023-01552-9
Article Google Scholar
Fuangkhon P (2023) Interclass boundary preservation (ibp): a data reduction algorithm. Int J Inf Technol 15(5):2333–2347. https://doi.org/10.1007/s41870-023-01275-x
Article Google Scholar
Patil NS, Sawarkar SD (2019) Semantic concept detection in video using hybrid model of cnn and svm classifiers. Int J Image Process (IJIP) 13(2):13
Google Scholar
Zhou Z, Chen J, Yang C-N, Sun X (2019) Video copy detection using spatio-temporal CNN features. IEEE Access 7:100658–100665. https://doi.org/10.1109/access.2019.2930173
Article Google Scholar
Khan MN, Alam A, Lee Y-K (2020) FALKON: large-scale content-based video retrieval utilizing deep-features and distributed in-memory computing. 2020 IEEE International Conference on Big Data and Smart Computing (BigComp). https://doi.org/10.1109/bigcomp48618.2020.0-102
Zhang L et al (2017) Exploiting spatial-temporal context for trajectory based action video retrieval. Multimed Tools Appl 77(2):2057–2081. https://doi.org/10.1007/s11042-017-4353-2
Article Google Scholar
Saoudi EM, Jai-Andaloussi S (2021) A distributed content-based video retrieval system for large datasets. J Big Data. https://doi.org/10.1186/s40537-021-00479-x
Article Google Scholar
Chou C-L, Chen H-T, Lee S-Y (2015) Pattern-based near-duplicate video retrieval and localization on web-scale videos. IEEE Trans Multimed 17(3):382–395. https://doi.org/10.1109/tmm.2015.2391674
Article Google Scholar
Qi M, Qin J, Yang Y, Wang Y, Luo J (2021) Semantics-aware spatial-temporal binaries for cross-modal video retrieval. IEEE Trans Image Process 30:2989–3004. https://doi.org/10.1109/tip.2020.3048680
Article MathSciNet Google Scholar
Chen H et al (2021) A supervised video hashing method based on a deep 3d convolutional neural network for large-scale video retrieval. Sensors 21(9):3094. https://doi.org/10.3390/s21093094
Article Google Scholar
**an Y et al (2022) Generalized few-shot video classification with video retrieval and feature generation. IEEE Trans Pattern Anal Mach Intell 44(12):8949–8961. https://doi.org/10.1109/tpami.2021.3120550
Article Google Scholar
Pouyanfar S, Chen S-C, Shyu M-L (2018) Deep spatio-temporal representation learning for multi-class imbalanced data classification. 2018 IEEE International Conference on Information Reuse and Integration (IRI). https://doi.org/10.1109/iri.2018.00064
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357. https://doi.org/10.1613/jair.953
Article Google Scholar
Two modifications of CNN (1976) IEEE Trans Syst Man Cybern SMC-6(11):769–772. https://doi.org/10.1109/tsmc.1976.4309452
Batista GEAPA, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newslett 6(1):20–29. https://doi.org/10.1145/1007730.1007735
Article Google Scholar
Han H, Wang W-Y, Mao B-H (2005) In: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Advances in Intelligent Computing, pp 878–887. https://doi.org/10.1007/11538059_91
Banerjee A, Kumar E, Ravinder M (2022) Transformed deep spatio temporalfeatures with fused distance for efficient video retrieval. pp 1–5. https://doi.org/10.1109/aist55798.2022.10064821
Article Google Scholar
Raschka S (2015) Python machine learning. Packt Publishing, Birmingham, UK
Google Scholar
Hu Y, Lu X (2018) Learning spatial-temporal features for video copy detection by the combination of CNN and RNN. J Vis Commun Image Represent 55:21–29. https://doi.org/10.1016/j.jvcir.2018.05.013
Article Google Scholar
Wang T, Chen Y, Zhang M, Chen J, Snoussi H (2017) Internal transfer learning for improving performance in human action recognition for small datasets. IEEE Access 5:17627–17633. https://doi.org/10.1109/access.2017.2746095
Article Google Scholar
Lin W-C, Tsai C-F, Hu Y-H, Jhang J-S (2017) Clustering-based undersampling in class-imbalanced data. Inf Sci 409–410:17–26. https://doi.org/10.1016/j.ins.2017.05.008
Article Google Scholar
Ahmad H, Kasasbeh B, Aldabaybah B, Rawashdeh E (2022) Class balancing framework for credit card fraud detection based on clustering and similarity-based selection (sbs). Int J Inf Technol 15(1):325–333. https://doi.org/10.1007/s41870-022-00987-w
Article Google Scholar
Soomro K, Zamir AR, Shah M (2012) Ucf101: a dataset of 101 human actions classes from videos in the wild. https://arxiv.org/abs/1212.0402
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. 2011 International Conference on Computer Vision. https://doi.org/10.1109/iccv.2011.6126543
Choi J, Wang Z, Lee S-C, Jeon WJ (2013) A spatio-temporal pyramid matching for video retrieval. Comput Vis Image Underst 117(6):660–669. https://doi.org/10.1016/j.cviu.2013.02.003
Article Google Scholar
Banerjee A, Kumar E, Ravinder M (2023) Conditional deep clustering based transformed spatio-temporal features and fused distance for efficient video retrieval. Int J Inf Technol 15(5):2349–2355. https://doi.org/10.1007/s41870-023-01327-2

Download references

Funding

The authors did not receive financial support from any organization for the submitted work.

Author information

Ela Kumar and M. Ravinder contributed equally to this work.

Authors and Affiliations

Department of Computer Science and Engineering, Indira Gandhi Delhi Technical University for Women, New Church Rd., New Delhi, 11006, India
Alina Banerjee, Ela Kumar & M. Ravinder
Department of Computer Science and Engineering, G D Goenka Iniversity, Sohna, Haryana, 122103, India
Alina Banerjee

Authors

Alina Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Ela Kumar
View author publications
You can also search for this author in PubMed Google Scholar
M. Ravinder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alina Banerjee.

Ethics declarations

Ethical approval

This article does not contain any studies involving human participants performed by any of the authors. This article also does not contain any studies involving animals performed by any of the authors.

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Banerjee, A., Kumar, E. & Ravinder, M. Learning clustered deep spatio-temporal prototypes using softmax regression for video information systems. Int. j. inf. tecnol. 16, 3085–3091 (2024). https://doi.org/10.1007/s41870-024-01826-w

Download citation

Received: 04 December 2023
Accepted: 13 March 2024
Published: 29 March 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s41870-024-01826-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning clustered deep spatio-temporal prototypes using softmax regression for video information systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Video summarization using deep learning techniques: a detailed analysis and investigation

Visualizing and Understanding Convolutional Networks

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical approval

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Learning clustered deep spatio-temporal prototypes using softmax regression for video information systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Video summarization using deep learning techniques: a detailed analysis and investigation

Visualizing and Understanding Convolutional Networks

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical approval

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation