Abstract
In recent decades, the confluence of CV and NLP technologies has grown in popularity. Many researchers have focused their attention on Image caption task. In recent years, academics have been more interested in image aesthetic description because of image aesthetic indicative of the level. In this study, we present an aesthetic description technique that combines image description and aesthetic description at the same time. We use a Siamese network to acquire datasets for training from two data domains: Image caption task and Image aesthetic description task. The parameters gained from training were migrated back to the conventional Encoder-Decoder model for testing after training. On image caption task, we chose the flickr8k datasets to reduce computing cost. On aesthetic task, the PCCD datasets was used. The final findings indicate that our technique is capable of simultaneously training datasets from two data domains and producing both kinds of image descriptions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, S., **, Q., Wang, P., Wu, Q.: Say as you wish: fine-grained control of image caption generation with abstract scene graphs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9962–9971 (2020)
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057. PMLR (2015)
Cornia, M., Baraldi, L., Cucchiara, R.: Show, control and tell: a framework for generating controllable and grounded captions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8307–8316 (2019)
**, X., Zhou, B., Zou, D., et al.: Image aesthetic quality evaluation technology development trend. Sci. Technol. Guide 9, 36–45 (2018)
Chang, K.Y., Lu, K.H., Chen, C.S.: Aesthetic critiques generation for photos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3514–3523 (2017)
**, X., Wu, L., Zhao, G., Li, X., Zhang, X., Ge, S., Zou, D., Zhou, B., Zhou, X.: Aesthetic attributes assessment of images. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 311–319 (2019)
Ghosal, K., Rana, A., Smolic, A.: Aesthetic image captioning from weakly-labelled photographs. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, p. 0 (2019)
Wang W, Yang S, Zhang W, et al.Neural aesthetic image reviewer[J].IET Computer Vision, 2019,13(8):749–758
**ong, K., Jiang, L., Dang, X., Wang, G., Ye, W., Qin, Z.: Towards personalized aesthetic image caption. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2020)
Mai, L., **, H., Liu, F.: Composition-preserving deep photo aesthetics assessment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 497–506 (2016)
**, X., Wu, L., Li, X., et al.: ILGNet: inception modules with connected local and global features for efficient image aesthetic quality classification using domain adaptation. IET Comput. Vis. 13(2), 206–212 (2019)
Talebi, H., Milanfar, P.: NIMA: neural image assessment. IEEE Trans. Image Process. 27(8), 3998–4011 (2018)
Lee, H., Hong, K., Kang, H., et al.: Photo aesthetics analysis via DCNN feature encoding. IEEE Trans. Multimedia 19(8), 1921–1932 (2017)
Kong, S., Shen, X., Lin, Z., Mech, R., Fowlkes, C.: Photo aesthetics ranking network with attributes and content adaptation. In: European Conference on Computer Vision, pp. 662–679. Springer, Cham (2016)
Schwarz, K., Wieschollek, P., Lensch, H.P.: Will people like your image? Learning the aesthetic space. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 2048–2057. IEEE (2018)
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation (2014). ar**v preprint ar**v:1406.1078
Bahdanau, D., Cho, K., Bengio, Y.: Neural Machine Translation by Jointly Learning to Align and Translate (2014). ar**v preprint ar**v:1409.0473
Wu, Q., Shen, C., Liu, L., Dick, A., Van Den Hengel, A.: What value do explicit high level concepts have in vision to language problems?. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 203–212 (2016)
Chen, X., Zitnick, C.L.: Learning a Recurrent Visual Representation for Image Caption Generation (2014). ar**v preprint ar**v:1411.5654
Liu, B., Fu, J., Kato, M.P., Yoshikawa, M.: Beyond narrative description: Generating poetry from images by multi-adversarial training. In Proceedings of the 26th ACM International Conference on Multimedia, pp. 783–791 (2018)
The Introduction of Author
Song **nghui: Male, born in 1994, graduated from the School of Computer Science and Technology, TianGong University with a master’s degree. Now working in Guangdong Business and Technology University: Once published the paper “CNTK communication optimization based on parameter server”, “Research on gene coexpression network based on RNA-seq data”, etc.
Zhu Peipei: Master degree, now working in Guangdong Business and Technology University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Song, X., Zhu, P. (2024). Combining Image Caption and Aesthetic Description Using Siamese Network. In: Zhang, Y., Qi, L., Liu, Q., Yin, G., Liu, X. (eds) Proceedings of the 13th International Conference on Computer Engineering and Networks. CENet 2023. Lecture Notes in Electrical Engineering, vol 1125. Springer, Singapore. https://doi.org/10.1007/978-981-99-9239-3_4
Download citation
DOI: https://doi.org/10.1007/978-981-99-9239-3_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-9238-6
Online ISBN: 978-981-99-9239-3
eBook Packages: EngineeringEngineering (R0)