We are improving our search experience. To check which content you have full access to, or for advanced search, go back to the old search.

Search

Please fill in this field.
Filters applied:

Search Results

Showing 1-20 of 8,306 results
  1. Dual visual align-cross attention-based image captioning transformer

    Region-based features widely used in image captioning are typically extracted using object detectors like Faster R-CNN. However, the approach has a...

    Yonggong Ren, **ghan Zhang, ... Dang N. H. Thanh in Multimedia Tools and Applications
    Article 17 May 2024
  2. Sentimental Visual Captioning using Multimodal Transformer

    We propose a new task called sentimental visual captioning that generates captions with the inherent sentiment reflected by the input image or video....

    Article 06 February 2023
  3. ST-VQA: shrinkage transformer with accurate alignment for visual question answering

    While transformer-based models have been remarkably successful in the field of visual question answering (VQA), their approaches to achieve vision...

    Haiying **a, Richeng Lan, ... Shuxiang Song in Applied Intelligence
    Article 06 May 2023
  4. RaSTFormer: region-aware spatiotemporal transformer for visual homogenization recognition in short videos

    With the surge in network traffic, the homogenization of short video content is becoming increasingly prominent, resulting in low-quality...

    Shuying Zhang, **g Zhang, ... Li Zhuo in Neural Computing and Applications
    Article 27 March 2024
  5. Relation-wise transformer network and reinforcement learning for visual navigation

    The task of object goal navigation is to drive an embodied agent to find the location of a given target only using visual observation. The map**...

    Yu He, Kang Zhou in Neural Computing and Applications
    Article Open access 25 April 2024
  6. TransFGVC: transformer-based fine-grained visual classification

    Fine-grained visual classification (FGVC) aims to identify subcategories of objects within the same superclass. This task is challenging owing to...

    Longfeng Shen, Bin Hou, ... Debao Chen in The Visual Computer
    Article 28 June 2024
  7. Visual contextual relationship augmented transformer for image captioning

    Abstract

    The image captioning task is among the most important tasks in computer vision. Most existing methods mine more useful contextual information...

    Qiang Su, Junbo Hu, Zhixin Li in Applied Intelligence
    Article 01 March 2024
  8. Multistage attention region supplement transformer for fine-grained visual categorization

    The classification of fine-grained images using computer technology employs neural network models to distinguish between instances of different...

    Aokun Mei, Hua Huo, ... Ningya Xu in The Visual Computer
    Article 17 June 2024
  9. A transformer based real-time photo captioning framework for visually impaired people with visual attention

    In recent years, transformer-based photo captioning frameworks plays a crucial role in improving individuals’ overall well-being, self-reliance, and...

    Abubeker Kiliyanal Muhammed Kunju, S. Baskar, ... Shafeena Karim A in Multimedia Tools and Applications
    Article 26 March 2024
  10. Local self-attention in transformer for visual question answering

    Visual Question Answering (VQA) is a multimodal task that requires models to understand both textual and visual information. Various VQA models have...

    **ang Shen, Dezhi Han, ... Gaofeng Luo in Applied Intelligence
    Article 15 December 2022
  11. Multi-granularity hypergraph-guided transformer learning framework for visual classification

    Fine-grained single-label classification tasks aim to distinguish highly similar categories but often overlook inter-category relationships....

    Jianjian Jiang, Ziwei Chen, ... **aochen Yuan in The Visual Computer
    Article 28 June 2024
  12. CLIP-enhanced multimodal machine translation: integrating visual and label features with transformer fusion

    Multimodal machine translation is a technique that leverages computer vision to improve the quality of text translation. Most recent multimodal...

    ShaoDong Cui, **nyan Yin, ... Hiroyuki Shinnou in Multimedia Tools and Applications
    Article 05 June 2024
  13. A robust attention-enhanced network with transformer for visual tracking

    Recently, Siamese-based trackers have become particularly popular. The correlation module in these trackers is responsible for fusing the feature...

    Fengwei Gu, Jun Lu, Chengtao Cai in Multimedia Tools and Applications
    Article 31 March 2023
  14. Enhancing Indian sign language recognition through data augmentation and visual transformer

    This paper introduces a novel approach to Indian Sign Language Recognition (ISLR) by integrating Keras, Visual Transformers (ViT), and sophisticated...

    Venus Singla, Seema Bawa, Jasmeet Singh in Neural Computing and Applications
    Article 13 May 2024
  15. Mini-3DCvT: a lightweight lip-reading method based on 3D convolution visual transformer

    Lip-reading has attracted more and more attention in recent years, and has wide application prospects and value in areas such as human–computer...

    Huijuan Wang, Boyan Cui, ... Jie Zhu in The Visual Computer
    Article 11 June 2024
  16. CWC-transformer: a visual transformer approach for compressed whole slide image classification

    The rapid development of Artificial Intelligence (AI) technology accelerates the application of computational pathology in clinical decision-making....

    Yaowei Wang, **g Guo, ... Kelong Wang in Neural Computing and Applications
    Article 10 January 2023
  17. Multi-modal transformer using two-level visual features for fake news detection

    Fake news with multimedia data is ubiquitous on the Internet nowadays, and it is difficult for users to distinguish them. Therefore, it is necessary...

    Bin Wang, Yong Feng, ... Bao-hua Qiang in Applied Intelligence
    Article 18 August 2022
  18. Audio-visual speech synthesis using vision transformer–enhanced autoencoders with ensemble of loss functions

    Abstract

    Audio-visual speech synthesis (AVSS) has garnered attention in recent years for its utility in the realm of audio-visual learning. AVSS...

    Subhayu Ghosh, Snehashis Sarkar, ... Nanda Dulal Jana in Applied Intelligence
    Article 27 March 2024
  19. A Comparative Study of CNN- and Transformer-Based Visual Style Transfer

    Vision Transformer has shown impressive performance on the image classification tasks. Observing that most existing visual style transfer (VST)...

    Hua-Peng Wei, Ying-Ying Deng, ... Wei-Ming Dong in Journal of Computer Science and Technology
    Article 31 May 2022
  20. Repformer: a robust shared-encoder dual-pipeline transformer for visual tracking

    Siamese-based trackers have achieved outstanding tracking performance. However, these trackers in complex scenarios struggle to adequately integrate...

    Fengwei Gu, Jun Lu, ... Zhaojie Ju in Neural Computing and Applications
    Article 22 July 2023
Did you find what you were looking for? Share feedback.