Combining Image Caption and Aesthetic Description Using Siamese Network

Song, **nghui; Zhu, Peipei

doi:10.1007/978-981-99-9239-3_4

**nghui Song⁴¹ &
Peipei Zhu⁴¹

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 1125))

Included in the following conference series:

International Conference on Computer Engineering and Networks

200 Accesses

Abstract

In recent decades, the confluence of CV and NLP technologies has grown in popularity. Many researchers have focused their attention on Image caption task. In recent years, academics have been more interested in image aesthetic description because of image aesthetic indicative of the level. In this study, we present an aesthetic description technique that combines image description and aesthetic description at the same time. We use a Siamese network to acquire datasets for training from two data domains: Image caption task and Image aesthetic description task. The parameters gained from training were migrated back to the conventional Encoder-Decoder model for testing after training. On image caption task, we chose the flickr8k datasets to reduce computing cost. On aesthetic task, the PCCD datasets was used. The final findings indicate that our technique is capable of simultaneously training datasets from two data domains and producing both kinds of image descriptions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Hardcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Aesthetic Multi-attributes Captioning Network for Photos

A Comparative Analysis on Image Caption Generator Using Deep Learning Architecture—ResNet and VGG16

Automatic image caption generation using deep learning

Article 01 June 2023

References

Chen, S., **, Q., Wang, P., Wu, Q.: Say as you wish: fine-grained control of image caption generation with abstract scene graphs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9962–9971 (2020)
Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)
Google Scholar
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057. PMLR (2015)
Google Scholar
Cornia, M., Baraldi, L., Cucchiara, R.: Show, control and tell: a framework for generating controllable and grounded captions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8307–8316 (2019)
Google Scholar
**, X., Zhou, B., Zou, D., et al.: Image aesthetic quality evaluation technology development trend. Sci. Technol. Guide 9, 36–45 (2018)
Google Scholar
Chang, K.Y., Lu, K.H., Chen, C.S.: Aesthetic critiques generation for photos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3514–3523 (2017)
Google Scholar
**, X., Wu, L., Zhao, G., Li, X., Zhang, X., Ge, S., Zou, D., Zhou, B., Zhou, X.: Aesthetic attributes assessment of images. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 311–319 (2019)
Google Scholar
Ghosal, K., Rana, A., Smolic, A.: Aesthetic image captioning from weakly-labelled photographs. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, p. 0 (2019)
Google Scholar
Wang W, Yang S, Zhang W, et al.Neural aesthetic image reviewer[J].IET Computer Vision, 2019,13(8):749–758
Google Scholar
**ong, K., Jiang, L., Dang, X., Wang, G., Ye, W., Qin, Z.: Towards personalized aesthetic image caption. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2020)
Google Scholar
Mai, L., **, H., Liu, F.: Composition-preserving deep photo aesthetics assessment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 497–506 (2016)
Google Scholar
**, X., Wu, L., Li, X., et al.: ILGNet: inception modules with connected local and global features for efficient image aesthetic quality classification using domain adaptation. IET Comput. Vis. 13(2), 206–212 (2019)
Google Scholar
Talebi, H., Milanfar, P.: NIMA: neural image assessment. IEEE Trans. Image Process. 27(8), 3998–4011 (2018)
Google Scholar
Lee, H., Hong, K., Kang, H., et al.: Photo aesthetics analysis via DCNN feature encoding. IEEE Trans. Multimedia 19(8), 1921–1932 (2017)
Google Scholar
Kong, S., Shen, X., Lin, Z., Mech, R., Fowlkes, C.: Photo aesthetics ranking network with attributes and content adaptation. In: European Conference on Computer Vision, pp. 662–679. Springer, Cham (2016)
Google Scholar
Schwarz, K., Wieschollek, P., Lensch, H.P.: Will people like your image? Learning the aesthetic space. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 2048–2057. IEEE (2018)
Google Scholar
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation (2014). ar**v preprint ar**v:1406.1078
Bahdanau, D., Cho, K., Bengio, Y.: Neural Machine Translation by Jointly Learning to Align and Translate (2014). ar**v preprint ar**v:1409.0473
Wu, Q., Shen, C., Liu, L., Dick, A., Van Den Hengel, A.: What value do explicit high level concepts have in vision to language problems?. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 203–212 (2016)
Google Scholar
Chen, X., Zitnick, C.L.: Learning a Recurrent Visual Representation for Image Caption Generation (2014). ar**v preprint ar**v:1411.5654
Liu, B., Fu, J., Kato, M.P., Yoshikawa, M.: Beyond narrative description: Generating poetry from images by multi-adversarial training. In Proceedings of the 26th ACM International Conference on Multimedia, pp. 783–791 (2018)
Google Scholar

The Introduction of Author

Song **nghui: Male, born in 1994, graduated from the School of Computer Science and Technology, TianGong University with a master’s degree. Now working in Guangdong Business and Technology University: Once published the paper “CNTK communication optimization based on parameter server”, “Research on gene coexpression network based on RNA-seq data”, etc.
Google Scholar
Zhu Peipei: Master degree, now working in Guangdong Business and Technology University.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Artificial Intelligence and Big Data, Guangdong Business and Technology University, Zhaoqing, 526000, Guangdong, China
**nghui Song & Peipei Zhu

Authors

**nghui Song
View author publications
You can also search for this author in PubMed Google Scholar
Peipei Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to **nghui Song .

Editor information

Editors and Affiliations

Wuxi University, Wuxi, Jiangsu, China
Yonghong Zhang
College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, Shandong, China
Lianyong Qi
School of Computer and Software, Nan**g University of Information Science and Technology, Nan**g, Jiangsu, China
Qi Liu
School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
Guangqiang Yin
School of Computing, Edinburgh Napier University, Edinburgh, UK
**aodong Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, X., Zhu, P. (2024). Combining Image Caption and Aesthetic Description Using Siamese Network. In: Zhang, Y., Qi, L., Liu, Q., Yin, G., Liu, X. (eds) Proceedings of the 13th International Conference on Computer Engineering and Networks. CENet 2023. Lecture Notes in Electrical Engineering, vol 1125. Springer, Singapore. https://doi.org/10.1007/978-981-99-9239-3_4

Download citation

DOI: https://doi.org/10.1007/978-981-99-9239-3_4
Published: 04 January 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-9238-6
Online ISBN: 978-981-99-9239-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Combining Image Caption and Aesthetic Description Using Siamese Network

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Aesthetic Multi-attributes Captioning Network for Photos

A Comparative Analysis on Image Caption Generator Using Deep Learning Architecture—ResNet and VGG16

Automatic image caption generation using deep learning

References

The Introduction of Author

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Combining Image Caption and Aesthetic Description Using Siamese Network

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Aesthetic Multi-attributes Captioning Network for Photos

A Comparative Analysis on Image Caption Generator Using Deep Learning Architecture—ResNet and VGG16

Automatic image caption generation using deep learning

References

The Introduction of Author

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation