Abstract
This paper presents a new video retrieval tool, Interactive VIdeo Search Tool (IVIST), which participates in the 2020 Video Browser Showdown (VBS). As a video retrieval tool, IVIST is equipped with proper and high-performing functionalities such as object detection, dominant-color finding, scene-text recognition and text-image retrieval. These functionalities are constructed with various deep neural networks. By adopting these functionalities, IVIST performs well in searching users’ desirable videos. Furthermore, due to user-friendly user interface, IVIST is easy to use even for novice users. Although IVIST is developed to participate in VBS, we hope that it will be applied as a practical video retrieval tool in the future, dealing with actual video data on the Internet.
S. Park and J. Song—have equally contributed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cobârzan, C., Schoeffmann, K., Bailer, W., et al.: Interactive video search tools: a detailed analysis of the video browser showdown 2015. Multimedia Tools Appl. 76, 5539–5571 (2017)
LokoÄŤ, J., et al.: Interactive search or sequential browsing? a detailed analysis of the video browser showdown 2018. ACM Trans. Multimedia Comput. Commun. Appl. 15(29), 18 (2019)
Deng, D., Liu, H., Li, X., Cai, D.: PixelLink.: detecting scene text via instance segmentation. ar**v preprint ar**v:1801.01315 (2018)
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 41(9), 2035–2048 (2018)
Bookstein, F.L.: Thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 567–585 (1989)
Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009)
Lee K.-H., **, C., Gang, H., Houdong, H., **aodong, H.: Stacked cross attention for image-text matching. ar**v preprint ar**v:1803.08024 (2018)
Chen, K., et al.: Hybrid task cascade for instance segmentation. ar**v preprint ar**v:1901.07518 (2019)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. ar**v preprint ar**v:1405.0312 (2014)
Kuznetsova, A., et al.: The open images dataset V4: unified image classification, object detection, and visual relationship detection at scale. ar**v preprint ar**v:1811.00982 (2018)
ZFTurbo: Keras-RetinaNet-for-Open-Images-Challenge-2018. https://github.com/zfturbo/keras-retinanet-for-open-images-challenge-2018
Lin, T.-Y., Goyal, P., Girchick, R., He, K., Dollar, P.: Focal loss for dense object detection. ar**v preprint ar**v:1708.02002 (2018)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. ar**v preprint ar**v:1712.00726 (2017)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. ar**v preprint ar**v:1703.06870 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. ar**v preprint ar**v:1506.01497 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR. IEEE Computer Society, pp. 770–778 (2016)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2671–2673 (1997)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)
Chorowski, J., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. ar**v preprint ar**v:1506.07503 (2015)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Park, S., Song, J., Park, M., Ro, Y.M. (2020). IVIST: Interactive VIdeo Search Tool in VBS 2020. In: Ro, Y., et al. MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science(), vol 11962. Springer, Cham. https://doi.org/10.1007/978-3-030-37734-2_74
Download citation
DOI: https://doi.org/10.1007/978-3-030-37734-2_74
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37733-5
Online ISBN: 978-3-030-37734-2
eBook Packages: Computer ScienceComputer Science (R0)