Abstract
With the rapid development of digital equipment and the continuous upgrading of online media, a growing number of people are willing to post videos on the web to share their daily lives [1, 2]. Generally, not all video segments are popular with audiences, some of which may be boring. In recent years, crowd-sourced time-sync video comments have emerged worldwide, supporting further research on temporal video labelling. In this paper, we propose a novel framework to achieve the following goal: Predicting which segment in a newly generated video stream will be popular among the audiences. At last, experimental results on real-world data demonstrate the effectiveness of the proposed framework and justify the idea of predicting the popularities of segments in a video exploiting crowd-sourced time-sync comments as a bridge to analyse videos.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Jelodar, A.B., Paulius, D., Sun, Y.: Long activity video understanding using functional object-oriented network. IEEE Trans. Multimedia (2018)
Chen, Y., Hao, C., Liu, A.X., et al.: Multi-level model for video object segmentation based on supervision optimization. IEEE Trans. Multimedia (2019)
Liu, J.Y., Yang, Y.H., Jeng, S.K.: Weakly-supervised visual instrument-playing action detection in videos. IEEE Trans. Multimedia (2018)
Yang, Y., Zhou, J., Ai, J., et al.: Video captioning by adversarial LSTM. IEEE Trans. Image Process. 27(11), 5600–5611 (2018)
Zhang, M., Yang, Y., Zhang, H., et al.: More is better: precise and detailed image captioning using online positive recall and missing concepts mining. IEEE Trans. Image Process. 28(1), 32–44 (2019)
Qiu, Y., Liu, Y., Arteaga-Falconi, J., et al.: EVM-CNN: real-time contactless heart rate estimation from facial video. IEEE Trans. Multimedia (2018)
Bin, Y., Yang, Y., Shen, F., et al.: Describing video with attention-based bidirectional LSTM. IEEE Trans. Cybern. 99, 1–11 (2018)
Hu, M., Yang, Y., Shen, F., et al.: Collective reconstructive embeddings for cross-modal hashing. IEEE Trans. Image Process. (2018)
Li, H., Ma, X., Wang, F., et al.: On popularity prediction of videos shared in online social networks. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 169–178. ACM (2013)
Yang, Y., Duan, Y., Wang, X., et al.: Hierarchical multi-clue modelling for poi popularity prediction with heterogeneous tourist information. IEEE Trans. Knowl. Data Eng. (2018)
Wu, B., Zhong, E., Tan, B., et al.: Crowdsourced time-sync video tagging using temporal and personalized topic modeling. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 721–730. ACM (2014)
Hamasaki, M., Takeda, H., Hope, T., et al.: Network analysis of an emergent massively collaborative creation community: how can people create videos collaboratively without collaboration?. In: Third International AAAI Conference on Weblogs and Social Media (2009)
Wu, Z., Ito, E.: Correlation analysis between user’s emotional comments and popularity measures. In: 2014 IIAI 3rd International Conference on Advanced Applied Informatics, pp. 280–283. IEEE (2014)
Lv, G., Xu, T., Chen, E., et al.: Reading the videos: temporal labeling for crowdsourced time-sync videos based on semantic embedding. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
**, Q., Chen, C.: Video highlights detection and summarization with lag-calibration based on concept-emotion map** of crowd-sourced time-sync comments. ar**v preprint ar**v:1708.02210 (2017)
Girgensohn, A., Boreczky, J.: Time-constrained keyframe selection technique. Multimedia Tools Appl. 11(3), 347–358 (2000)
Jiao, Y., Li, Z., Huang, S., et al.: Three-dimensional attention-based deep ranking model for video highlight detection. IEEE Trans. Multimedia 20(10), 2693–2705 (2018)
Merler, M., Mac, K.N.C., Joshi, D., et al.: Automatic curation of sports highlights using multimodal excitement features. IEEE Trans. Multimedia (2018)
Lin, K.S., Lee, A., Yang, Y.H., et al.: Automatic highlights extraction for drama video using music emotion and human face features. Neurocomputing 119, 111–117 (2013)
Hanjalic, A., Xu, L.Q.: Affective video content representation and modeling. IEEE Trans. Multimedia 7(1), 143–154 (2005)
Ngo, C.W., Ma, Y.F., Zhang, H.J.: Video summarization and scene detection by graph modeling. IEEE Trans. Circuits Syst. Video Technol. 15(2), 296–305 (2005)
Kurach, K., Gelly, S., Jastrzebski, M., et al.: Better text understanding through image-to-text transfer. ar**v preprint ar**v:1705.08386 (2017)
Ferman, A.M., Tekalp, A.M., Mehrotra, R.: Robust color histogram descriptors for video segment retrieval and identification. IEEE Trans. Image Process. 11(5), 497–508 (2002)
Yao, L., Torabi, A., Cho, K., et al.: Describing videos by exploiting temporal structure. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4507–4515 (2015)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Pan, P., Xu, Z., Yang, Y., et al.: Hierarchical recurrent neural encoder for video representation with application to captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1029–1038 (2016)
Venugopalan, S., Rohrbach, M., Donahue, J., et al.: Sequence to sequence-video to text. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4534–4542 (2015)
Venugopalan, S., Hendricks, L.A., Mooney, R., et al.: Improving LSTM-based video description with linguistic knowledge mined from text. ar**v preprint ar**v:1604.01729 (2016)
Huang, W., Chan, K.L., Li, H., Lim, J.H., Liu, J., Wong, T.Y.: Content-based medical image retrieval with metric learning via rank correlation. In: Wang, F., Yan, P., Suzuki, K., Shen, D. (eds.) MLMI 2010. LNCS, vol. 6357, pp. 18–25. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15948-0_3
Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models. ar**v preprint ar**v:1411.2539 (2014)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hu, M., et al.: Hashing with angular reconstructive embeddings. IEEE Trans. Image Process. 27(2), 545–555 (2018)
Acknowledgements
This work is supported by Major Scientific and Technological Special Project of Guizhou Province (20183002).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhou, J., Ai, J., Wang, Z., Chen, S., Wei, Q. (2019). Discovering Attractive Segments in the User Generated Video Streams. In: Shao, J., Yiu, M., Toyoda, M., Zhang, D., Wang, W., Cui, B. (eds) Web and Big Data. APWeb-WAIM 2019. Lecture Notes in Computer Science(), vol 11642. Springer, Cham. https://doi.org/10.1007/978-3-030-26075-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-26075-0_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26074-3
Online ISBN: 978-3-030-26075-0
eBook Packages: Computer ScienceComputer Science (R0)