Abstract
We present a method that can analyze coded ultra-high resolution (UHD) video content an order of magnitude faster than real-time. We observe that the larger the resolution of a video, the larger the fraction of the overall processing time is spent on decoding frames from the video. In this paper, we exploit the way video is coded to significantly speed up the frame decoding process. More precisely, we only decode keyframes, which can be decoded significantly faster than ‘random’ frames in the video. A key insight is that in modern video codecs, keyframes are often placed around scene changes (shot boundaries), and hence form a very representative subset of frames of the video. We show on the example of video genre tagging that keyframes nicely lend themselves to video analysis tasks. Unlike previous genre prediction methods which include a multitude of signals, we train a per-frame genre classification system using a CNN that solely takes (key-)frames as input. We show that the aggregated genre predictions are very competitive to much more involved methods at predicting the video genre(s), and even outperform state-of-the-art genre tagging that solely rely on video frames as input. The proposed system can reliably tag video genres of a compressed video between 12\(\times \) (8K content) and 96\(\times \) (1080p content) faster than real-time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The resizing to the same resolution guarantees that writing out the frames to disk does not further penalize larger resolution videos.
References
Ekenel, H.K., Semela, T., Stiefelhagen, R.: Content-based video genre classification using multiple cues. In: ACM Workshop on Automated Information Extraction in Media Production, pp. 21–26 (2010). https://doi.org/10.1145/1877850.1877858
Glasberg, R., Schmiedeke, S., Mocigemba, M., Sikora, T.: New real-time approaches for video-genre-classification using high-level descriptors and a set of classifiers. In: IEEE International Conference on Semantic Computing, pp. 120–127 (2008). https://doi.org/10.1109/ICSC.2008.92
Gygli, M.: Ridiculously fast shot boundary detection with fully convolutional neural networks. ar**v (2017)
Jain, J.R., Jain, A.K.: Displacement measurement and its application in interframe image coding. IEEE Trans. Commun. 29(12), 1799–1808 (1981). https://doi.org/10.1109/TCOM.1981.1094950
Mangolin, R.B., et al.: A multimodal approach for multi-label movie genre classification. Multim. Tools Appl. (2020). https://doi.org/10.1007/s11042-020-10086-2
Schwarz, H., Marpe, D., Wiegand, T.: Analysis of hierarchical b pictures and MCTF. In: IEEE International Conference on Multimedia and Expo, pp. 1929–1932 (2006). https://doi.org/10.1109/ICME.2006.262934
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE Conference on Computer Vision and Pattern Recognition (2015). https://doi.org/10.1109/CVPR.2016.308
Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. In: 36th International Conference on Machine Learning 2019-June, pp. 10691–10700 (2019)
Wehrmann, J., Barros, R.C.: Convolutions through time for multi-label movie genre classification. In: ACM Symposium on Applied Computing, pp. 114–119 (2017). https://doi.org/10.1145/3019612.3019641
Zhou, H., Hermans, T., Karandikar, A.V., Rehg, J.M.: Movie genre classification via scene categorization. In: Proceedings of the 18th ACM International Conference on Multimedia, January 2016, pp. 747–750 (2010). https://doi.org/10.1145/1873951.1874068
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Rüfenacht, D. (2022). Keyframe Insights into Real-Time Video Tagging of Compressed UHD Content. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing – ICIAP 2022. ICIAP 2022. Lecture Notes in Computer Science, vol 13233. Springer, Cham. https://doi.org/10.1007/978-3-031-06433-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-06433-3_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06432-6
Online ISBN: 978-3-031-06433-3
eBook Packages: Computer ScienceComputer Science (R0)