Keyframe Insights into Real-Time Video Tagging of Compressed UHD Content

  • Conference paper
  • First Online:
Image Analysis and Processing – ICIAP 2022 (ICIAP 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13233))

Included in the following conference series:

  • 1262 Accesses

Abstract

We present a method that can analyze coded ultra-high resolution (UHD) video content an order of magnitude faster than real-time. We observe that the larger the resolution of a video, the larger the fraction of the overall processing time is spent on decoding frames from the video. In this paper, we exploit the way video is coded to significantly speed up the frame decoding process. More precisely, we only decode keyframes, which can be decoded significantly faster than ‘random’ frames in the video. A key insight is that in modern video codecs, keyframes are often placed around scene changes (shot boundaries), and hence form a very representative subset of frames of the video. We show on the example of video genre tagging that keyframes nicely lend themselves to video analysis tasks. Unlike previous genre prediction methods which include a multitude of signals, we train a per-frame genre classification system using a CNN that solely takes (key-)frames as input. We show that the aggregated genre predictions are very competitive to much more involved methods at predicting the video genre(s), and even outperform state-of-the-art genre tagging that solely rely on video frames as input. The proposed system can reliably tag video genres of a compressed video between 12\(\times \) (8K content) and 96\(\times \) (1080p content) faster than real-time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 67.40
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 85.59
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The resizing to the same resolution guarantees that writing out the frames to disk does not further penalize larger resolution videos.

References

  1. Ekenel, H.K., Semela, T., Stiefelhagen, R.: Content-based video genre classification using multiple cues. In: ACM Workshop on Automated Information Extraction in Media Production, pp. 21–26 (2010). https://doi.org/10.1145/1877850.1877858

  2. Glasberg, R., Schmiedeke, S., Mocigemba, M., Sikora, T.: New real-time approaches for video-genre-classification using high-level descriptors and a set of classifiers. In: IEEE International Conference on Semantic Computing, pp. 120–127 (2008). https://doi.org/10.1109/ICSC.2008.92

  3. Gygli, M.: Ridiculously fast shot boundary detection with fully convolutional neural networks. ar**v (2017)

    Google Scholar 

  4. Jain, J.R., Jain, A.K.: Displacement measurement and its application in interframe image coding. IEEE Trans. Commun. 29(12), 1799–1808 (1981). https://doi.org/10.1109/TCOM.1981.1094950

  5. Mangolin, R.B., et al.: A multimodal approach for multi-label movie genre classification. Multim. Tools Appl. (2020). https://doi.org/10.1007/s11042-020-10086-2

  6. Schwarz, H., Marpe, D., Wiegand, T.: Analysis of hierarchical b pictures and MCTF. In: IEEE International Conference on Multimedia and Expo, pp. 1929–1932 (2006). https://doi.org/10.1109/ICME.2006.262934

  7. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE Conference on Computer Vision and Pattern Recognition (2015). https://doi.org/10.1109/CVPR.2016.308

  8. Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. In: 36th International Conference on Machine Learning 2019-June, pp. 10691–10700 (2019)

    Google Scholar 

  9. Wehrmann, J., Barros, R.C.: Convolutions through time for multi-label movie genre classification. In: ACM Symposium on Applied Computing, pp. 114–119 (2017). https://doi.org/10.1145/3019612.3019641

  10. Zhou, H., Hermans, T., Karandikar, A.V., Rehg, J.M.: Movie genre classification via scene categorization. In: Proceedings of the 18th ACM International Conference on Multimedia, January 2016, pp. 747–750 (2010). https://doi.org/10.1145/1873951.1874068

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dominic Rüfenacht .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rüfenacht, D. (2022). Keyframe Insights into Real-Time Video Tagging of Compressed UHD Content. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing – ICIAP 2022. ICIAP 2022. Lecture Notes in Computer Science, vol 13233. Springer, Cham. https://doi.org/10.1007/978-3-031-06433-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06433-3_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06432-6

  • Online ISBN: 978-3-031-06433-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation