Search Page | SpringerLink

Dual visual align-cross attention-based image captioning transformer

Region-based features widely used in image captioning are typically extracted using object detectors like Faster R-CNN. However, the approach has a...

Yonggong Ren, **ghan Zhang, ... Dang N. H. Thanh in Multimedia Tools and Applications

Article 17 May 2024

Sentimental Visual Captioning using Multimodal Transformer

We propose a new task called sentimental visual captioning that generates captions with the inherent sentiment reflected by the input image or video....

**nxiao Wu, Tong Li in International Journal of Computer Vision

Article 06 February 2023

ST-VQA: shrinkage transformer with accurate alignment for visual question answering

While transformer-based models have been remarkably successful in the field of visual question answering (VQA), their approaches to achieve vision...

Haiying **a, Richeng Lan, ... Shuxiang Song in Applied Intelligence

Article 06 May 2023

RaSTFormer: region-aware spatiotemporal transformer for visual homogenization recognition in short videos

With the surge in network traffic, the homogenization of short video content is becoming increasingly prominent, resulting in low-quality...

Shuying Zhang, **g Zhang, ... Li Zhuo in Neural Computing and Applications

Article 27 March 2024

Relation-wise transformer network and reinforcement learning for visual navigation

The task of object goal navigation is to drive an embodied agent to find the location of a given target only using visual observation. The map**...

Yu He, Kang Zhou in Neural Computing and Applications

Article Open access 25 April 2024

TransFGVC: transformer-based fine-grained visual classification

Fine-grained visual classification (FGVC) aims to identify subcategories of objects within the same superclass. This task is challenging owing to...

Longfeng Shen, Bin Hou, ... Debao Chen in The Visual Computer

Article 28 June 2024

Visual contextual relationship augmented transformer for image captioning

Abstract

The image captioning task is among the most important tasks in computer vision. Most existing methods mine more useful contextual information...

Qiang Su, Junbo Hu, Zhixin Li in Applied Intelligence

Article 01 March 2024

Multistage attention region supplement transformer for fine-grained visual categorization

The classification of fine-grained images using computer technology employs neural network models to distinguish between instances of different...

Aokun Mei, Hua Huo, ... Ningya Xu in The Visual Computer

Article 17 June 2024

A transformer based real-time photo captioning framework for visually impaired people with visual attention

In recent years, transformer-based photo captioning frameworks plays a crucial role in improving individuals’ overall well-being, self-reliance, and...

Abubeker Kiliyanal Muhammed Kunju, S. Baskar, ... Shafeena Karim A in Multimedia Tools and Applications

Article 26 March 2024

Local self-attention in transformer for visual question answering

Visual Question Answering (VQA) is a multimodal task that requires models to understand both textual and visual information. Various VQA models have...

**ang Shen, Dezhi Han, ... Gaofeng Luo in Applied Intelligence

Article 15 December 2022

Multi-granularity hypergraph-guided transformer learning framework for visual classification

Fine-grained single-label classification tasks aim to distinguish highly similar categories but often overlook inter-category relationships....

Jianjian Jiang, Ziwei Chen, ... **aochen Yuan in The Visual Computer

Article 28 June 2024

CLIP-enhanced multimodal machine translation: integrating visual and label features with transformer fusion

Multimodal machine translation is a technique that leverages computer vision to improve the quality of text translation. Most recent multimodal...

ShaoDong Cui, **nyan Yin, ... Hiroyuki Shinnou in Multimedia Tools and Applications

Article 05 June 2024

A robust attention-enhanced network with transformer for visual tracking

Recently, Siamese-based trackers have become particularly popular. The correlation module in these trackers is responsible for fusing the feature...

Fengwei Gu, Jun Lu, Chengtao Cai in Multimedia Tools and Applications

Article 31 March 2023

Enhancing Indian sign language recognition through data augmentation and visual transformer

This paper introduces a novel approach to Indian Sign Language Recognition (ISLR) by integrating Keras, Visual Transformers (ViT), and sophisticated...

Venus Singla, Seema Bawa, Jasmeet Singh in Neural Computing and Applications

Article 13 May 2024

Mini-3DCvT: a lightweight lip-reading method based on 3D convolution visual transformer

Lip-reading has attracted more and more attention in recent years, and has wide application prospects and value in areas such as human–computer...

Huijuan Wang, Boyan Cui, ... Jie Zhu in The Visual Computer

Article 11 June 2024

CWC-transformer: a visual transformer approach for compressed whole slide image classification

The rapid development of Artificial Intelligence (AI) technology accelerates the application of computational pathology in clinical decision-making....

Yaowei Wang, **g Guo, ... Kelong Wang in Neural Computing and Applications

Article 10 January 2023

Multi-modal transformer using two-level visual features for fake news detection

Fake news with multimedia data is ubiquitous on the Internet nowadays, and it is difficult for users to distinguish them. Therefore, it is necessary...

Bin Wang, Yong Feng, ... Bao-hua Qiang in Applied Intelligence

Article 18 August 2022

Audio-visual speech synthesis using vision transformer–enhanced autoencoders with ensemble of loss functions

Abstract

Audio-visual speech synthesis (AVSS) has garnered attention in recent years for its utility in the realm of audio-visual learning. AVSS...

Subhayu Ghosh, Snehashis Sarkar, ... Nanda Dulal Jana in Applied Intelligence

Article 27 March 2024

A Comparative Study of CNN- and Transformer-Based Visual Style Transfer

Vision Transformer has shown impressive performance on the image classification tasks. Observing that most existing visual style transfer (VST)...

Hua-Peng Wei, Ying-Ying Deng, ... Wei-Ming Dong in Journal of Computer Science and Technology

Article 31 May 2022

Repformer: a robust shared-encoder dual-pipeline transformer for visual tracking

Siamese-based trackers have achieved outstanding tracking performance. However, these trackers in complex scenarios struggle to adequately integrate...

Fengwei Gu, Jun Lu, ... Zhaojie Ju in Neural Computing and Applications

Article 22 July 2023

Search

Filters

Search Results

Search

Navigation