Keyframe Insights into Real-Time Video Tagging of Compressed UHD Content

Rüfenacht, Dominic

doi:10.1007/978-3-031-06433-3_13

Dominic Rüfenacht ORCID: orcid.org/0000-0002-1450-8070¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13233))

Included in the following conference series:

International Conference on Image Analysis and Processing

1262 Accesses

Abstract

We present a method that can analyze coded ultra-high resolution (UHD) video content an order of magnitude faster than real-time. We observe that the larger the resolution of a video, the larger the fraction of the overall processing time is spent on decoding frames from the video. In this paper, we exploit the way video is coded to significantly speed up the frame decoding process. More precisely, we only decode keyframes, which can be decoded significantly faster than ‘random’ frames in the video. A key insight is that in modern video codecs, keyframes are often placed around scene changes (shot boundaries), and hence form a very representative subset of frames of the video. We show on the example of video genre tagging that keyframes nicely lend themselves to video analysis tasks. Unlike previous genre prediction methods which include a multitude of signals, we train a per-frame genre classification system using a CNN that solely takes (key-)frames as input. We show that the aggregated genre predictions are very competitive to much more involved methods at predicting the video genre(s), and even outperform state-of-the-art genre tagging that solely rely on video frames as input. The proposed system can reliably tag video genres of a compressed video between 12\(\times \) (8K content) and 96\(\times \) (1080p content) faster than real-time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 67.40; Price includes VAT (Germany)

Softcover Book: EUR 85.59; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Conditional Entropy Coding for Efficient Video Compression

Proof-of-concept: role of generic content characteristics in optimizing video encoders

Article 08 September 2017

Long-term prediction for hierarchical-B-picture-based coding of video with repeated shots

Article 01 March 2018

Notes

1.
The resizing to the same resolution guarantees that writing out the frames to disk does not further penalize larger resolution videos.

References

Ekenel, H.K., Semela, T., Stiefelhagen, R.: Content-based video genre classification using multiple cues. In: ACM Workshop on Automated Information Extraction in Media Production, pp. 21–26 (2010). https://doi.org/10.1145/1877850.1877858
Glasberg, R., Schmiedeke, S., Mocigemba, M., Sikora, T.: New real-time approaches for video-genre-classification using high-level descriptors and a set of classifiers. In: IEEE International Conference on Semantic Computing, pp. 120–127 (2008). https://doi.org/10.1109/ICSC.2008.92
Gygli, M.: Ridiculously fast shot boundary detection with fully convolutional neural networks. ar**v (2017)
Google Scholar
Jain, J.R., Jain, A.K.: Displacement measurement and its application in interframe image coding. IEEE Trans. Commun. 29(12), 1799–1808 (1981). https://doi.org/10.1109/TCOM.1981.1094950
Mangolin, R.B., et al.: A multimodal approach for multi-label movie genre classification. Multim. Tools Appl. (2020). https://doi.org/10.1007/s11042-020-10086-2
Schwarz, H., Marpe, D., Wiegand, T.: Analysis of hierarchical b pictures and MCTF. In: IEEE International Conference on Multimedia and Expo, pp. 1929–1932 (2006). https://doi.org/10.1109/ICME.2006.262934
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE Conference on Computer Vision and Pattern Recognition (2015). https://doi.org/10.1109/CVPR.2016.308
Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. In: 36th International Conference on Machine Learning 2019-June, pp. 10691–10700 (2019)
Google Scholar
Wehrmann, J., Barros, R.C.: Convolutions through time for multi-label movie genre classification. In: ACM Symposium on Applied Computing, pp. 114–119 (2017). https://doi.org/10.1145/3019612.3019641
Zhou, H., Hermans, T., Karandikar, A.V., Rehg, J.M.: Movie genre classification via scene categorization. In: Proceedings of the 18th ACM International Conference on Multimedia, January 2016, pp. 747–750 (2010). https://doi.org/10.1145/1873951.1874068

Download references

Author information

Authors and Affiliations

Mobius Labs GmbH, Berlin, Germany
Dominic Rüfenacht

Authors

Dominic Rüfenacht
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dominic Rüfenacht .

Editor information

Editors and Affiliations

Boston University, Boston, MA, USA
Stan Sclaroff
National Research Council, Lecce, Italy
Cosimo Distante
National Research Council, Lecce, Italy
Marco Leo
University of Catania, Catania, Italy
Giovanni M. Farinella
Technische Universität München, Garching, Germany
Federico Tombari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rüfenacht, D. (2022). Keyframe Insights into Real-Time Video Tagging of Compressed UHD Content. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing – ICIAP 2022. ICIAP 2022. Lecture Notes in Computer Science, vol 13233. Springer, Cham. https://doi.org/10.1007/978-3-031-06433-3_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-06433-3_13
Published: 15 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06432-6
Online ISBN: 978-3-031-06433-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Keyframe Insights into Real-Time Video Tagging of Compressed UHD Content

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Conditional Entropy Coding for Efficient Video Compression

Proof-of-concept: role of generic content characteristics in optimizing video encoders

Long-term prediction for hierarchical-B-picture-based coding of video with repeated shots

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Keyframe Insights into Real-Time Video Tagging of Compressed UHD Content

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Conditional Entropy Coding for Efficient Video Compression

Proof-of-concept: role of generic content characteristics in optimizing video encoders

Long-term prediction for hierarchical-B-picture-based coding of video with repeated shots

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation