Automatic image captioning system using a deep learning approach

Deepak, Gerard; Gali, Sowmya; Sonker, Abhilash; Jos, Bobin Cherian; Daya Sagar, K. V.; Singh, Charanjeet

doi:10.1007/s00500-023-08544-8

Automatic image captioning system using a deep learning approach

Focus
Published: 27 May 2023

(2023)
Cite this article

Soft Computing Aims and scope Submit manuscript

Gerard Deepak¹,
Sowmya Gali²,
Abhilash Sonker³,
Bobin Cherian Jos⁴,
K. V. Daya Sagar⁵ &
…
Charanjeet Singh⁶

366 Accesses
2 Citations
Explore all metrics

Abstract

This paper's residual network is tailored to increase the high-quality image caption generation ability. The captioning is exploited using the relevant content with high-quality interpretation. The research develops a Residual Attention Generative Adversarial Network (RAGAN) and uses attention-based residual learning in Generative Adversarial Network (GAN) to improve the diversity and fidelity of the generated image captions. The RAGAN exploits the words based on the feature maps faster to generate high-quality captions. The RAGAN improves the diversity of captions generated and increases the language metrics scores. The generator is designed as an encoder-decoder mechanism that operates in an unsupervised manner. The residual learning is adopted between the encoder and decoder network. The discriminator is connected to a language evaluator unit, which provides feed-forward to the generator and discriminator to either positively or negatively influence the image captioning process. The experiments show that the proposed RAGAN performs better than the state-of-the-art GAN models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Image Caption Combined with GAN Training Method

Adversarial Image Caption Generator Network

Article 31 March 2021

Towards Generating Stylized Image Captions via Adversarial Training

Data availability

Data included in article/supplementary material/referenced in article.

References

Beddiar DR, Oussalah M, Seppänen T (2023) Automatic captioning for medical imaging (MIC): a rapid review of literature. Artif Intell Rev 56(5):4019–4076. https://doi.org/10.1007/s10462-022-10270-w
Article Google Scholar
Braun S, Starr K (2019) Finding the right words: Investigating machine-generated video description quality using a corpus-based approach. J Audiovis Transl 2(2):11–35
Article Google Scholar
Cao S, An G, Zheng Z, Ruan Q (2020) Interactions guided generative adversarial network for unsupervised image captioning. Neurocomputing 417:419–431
Article Google Scholar
Chen T, Li Z, Wu J, Ma H, Su B (2022) Improving image captioning with pyramid attention and SC-GAN. Image vis Comput 117:104340
Article Google Scholar
Chen X, Fang H, Lin TY, Vedantam R, Gupta S, Dollár P, Zitnick CL (2015) Microsoft coco captions: data collection and evaluation server. ar**v preprint ar**v:1504.00325
Chen C, Mu S, **ao W, Ye Z, Wu L, Ju Q (2019) Improving image captioning with conditional generative adversarial nets. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, no 01, pp 8142–8150
Cui W, Wang F, He X, Zhang D, Xu X, Yao M et al (2019) Multi-scale semantic segmentation and spatial relationship recognition of remote sensing images based on an attention model. Remote Sens 11(9):1044
Article Google Scholar
Das B, Pal R, Majumder M, Phadikar S, Sekh AA (2023) A visual attention-based model for bengali image captioning. SN Comput Sci 4(2):208
Article Google Scholar
Frolov S, Hinz T, Raue F, Hees J, Dengel A (2021) Adversarial text-to-image synthesis: a review. Neural Netw 144:187–209
Article Google Scholar
Hossain MZ, Sohel F, Shiratuddin MF, Laga H, Bennamoun M (2021) Text to image synthesis for improved image captioning. IEEE Access 9:64918–64928
Article Google Scholar
Huang J, Liu Y, Gong S, ** H (2021) Cross-sentence temporal and semantic relations in video activity localisation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7199–7208
Liu J, Wang K, Xu C, Zhao Z, Xu R, Shen Y, Yang M (2020) Interactive dual generative adversarial networks for image captioning. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, no 07, pp 11588–11595
Munusamy H (2022) Video captioning using semantically contextual generative adversarial network. Comput vis Image Underst 221:103453
Article Google Scholar
Obeso AM, Benois-Pineau J, Vázquez MSG, Acosta AÁR (2022) Visual vs internal attention mechanisms in deep neural networks for image classification and object detection. Pattern Recogn 123:108411
Article Google Scholar
Plummer BA, Wang L, Cervantes CM, Caicedo JC, Hockenmaier J, Lazebnik S (2015) Flickr30k entities: collecting region-to-phrase correspondences for richer image-to-sentence models. In: Proceedings of the IEEE international conference on computer vision, pp 2641–2649
Poongodi M, Hamdi M, Wang H (2022). Image and audio caps: automated captioning of background sounds and images using deep learning. Multimedia Syst 1–9
Sargar O, Kinger S (2021) Image captioning methods and metrics. In: 2021 international conference on emerging smart computing and informatics (ESCI). IEEE, pp 522–526
Setiawan D, Saffachrissa MAC, Tamara S, Suhartono D (2022) Image captioning with style using generative adversarial networks. Int J Inf Vis 6(1):26–32
Google Scholar
Sharma H, Srivastava S (2022) Graph neural network-based visual relationship and multilevel attention for image captioning. J Electron Imaging 31(5):053022
Article Google Scholar
Shen C, Kasra M, Pan W, Bassett GA, Malloch Y, O’Brien JF (2019) Fake images: the effects of source, intermediary, and digital media literacy on contextual assessment of image credibility online. New Media Soc 21(2):438–463
Article Google Scholar
Singh A, Singh TD, Bandyopadhyay S (2021) An encoder-decoder based framework for hindi image caption generation. Multimedia Tools Appl 80(28–29):35721–35740
Article Google Scholar
Stowell D (2022) Computational bioacoustics with deep learning: a review and roadmap. PeerJ 10:e13152
Article Google Scholar
Tomii K, Kumar S, Zhi D, Brenner SE (2020) Meta-align: a novel HMM-based algorithm for pairwise alignment of error-prone sequencing reads. bioRxiv, 2020-05
Vizoso Á, Vaz-Álvarez M, López-García X (2021) Fighting deepfakes: media and internet giants’ converging and diverging strategies against hi-tech misinformation. Media Commun 9(1):291–300
Article Google Scholar
Wang J, Xu W, Wang Q, Chan AB (2022) On distinctive image captioning via comparing and reweighting. IEEE Trans Pattern Anal Mach Intell 45(2):2088–2103
Article Google Scholar
Wei Y, Wang L, Cao H, Shao M, Wu C (2020) Multi-attention generative adversarial network for image captioning. Neurocomputing 387:91–99
Article Google Scholar
**ong R, Song Y, Li H, Wang Y (2019) Onsite video mining for construction hazards identification with visual relationships. Adv Eng Inform 42:100966
Article Google Scholar
Yan S, Wu F, Smith JS, Lu W, Zhang B (2018). Image captioning using adversarial networks and reinforcement learning. In: 2018 24th international conference on pattern recognition (ICPR). IEEE, pp 248–253
Yang M, Zhao W, Xu W, Feng Y, Zhao Z, Chen X, Lei K (2018) Multitask learning for cross-domain image captioning. IEEE Trans Multimedia 21(4):1047–1061
Article Google Scholar
Yang M, Liu J, Shen Y, Zhao Z, Chen X, Wu Q, Li C (2020) An ensemble of generation-and retrieval-based image captioning with dual generator generative adversarial network. IEEE Trans Image Process 29:9627–9640
Article MathSciNet MATH Google Scholar
Zhang B, Zhu J, Su H (2023) Toward the third generation artificial intelligence. Sci China Inf Sci 66(2):1–19
Article MathSciNet Google Scholar
Zhou Y, Tao W, Zhang W (2021) Triple sequence generative adversarial nets for unsupervised image captioning. In: ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 7598–7602
Zhou Z, Yang Y, Li Z, Zhang X, Huang F (2022) Image captioning with residual swin transformer and actor-critic. Neural Comput Appl 1–13

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Manipal Institute of Technology Bengaluru, Manipal Academy of Higher Education, Manipal, India
Gerard Deepak
Department of Electronics and Communication Engineering, Santhiram Engineering College, Nandyal, Andhra Pradesh, India
Sowmya Gali
Department of Information Technology, M.I.T.S Gwalior, Gwalior, Madhya Pradesh, India
Abhilash Sonker
Department of Mechanical Engineering, Mar Athanasius College of Engineering, Kothamangalam, India
Bobin Cherian Jos
Electronics and Computer Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, India
K. V. Daya Sagar
Department of Electronics and Communication, Deenbandhu Chhotu Ram University of Science and Technology, Murthal, Haryana, India
Charanjeet Singh

Authors

Gerard Deepak
View author publications
You can also search for this author in PubMed Google Scholar
Sowmya Gali
View author publications
You can also search for this author in PubMed Google Scholar
Abhilash Sonker
View author publications
You can also search for this author in PubMed Google Scholar
Bobin Cherian Jos
View author publications
You can also search for this author in PubMed Google Scholar
K. V. Daya Sagar
View author publications
You can also search for this author in PubMed Google Scholar
Charanjeet Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gerard Deepak.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare relevant to this article's content.

Human and animal rights

This research does not involve any human participants and/or animals; hence, any informed consent or statement on the welfare of animals does not apply to this research.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Deepak, G., Gali, S., Sonker, A. et al. Automatic image captioning system using a deep learning approach. Soft Comput (2023). https://doi.org/10.1007/s00500-023-08544-8

Download citation

Accepted: 13 May 2023
Published: 27 May 2023
DOI: https://doi.org/10.1007/s00500-023-08544-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic image captioning system using a deep learning approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Image Caption Combined with GAN Training Method

Adversarial Image Caption Generator Network

Towards Generating Stylized Image Captions via Adversarial Training

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Automatic image captioning system using a deep learning approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Image Caption Combined with GAN Training Method

Adversarial Image Caption Generator Network

Towards Generating Stylized Image Captions via Adversarial Training

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation