We are improving our search experience. To check which content you have full access to, or for advanced search, go back to the old search.

Search

Please fill in this field.
Filters applied:

Search Results

Showing 1-20 of 10,000 results
  1. Visual attention network

    While originally designed for natural language processing tasks, the self-attention mechanism has recently taken various computer vision areas by...

    Meng-Hao Guo, Cheng-Ze Lu, ... Shi-Min Hu in Computational Visual Media
    Article Open access 28 July 2023
  2. MANet: Mixed Attention Network for Visual Explanation

    Various visual explanation methods, such as CAM and Grad-CAM, have been proposed to visualize and interpret predictions made by CNNs. Recent efforts...

    **g**g Bai, Yoshinobu Kawahara in New Generation Computing
    Article Open access 23 May 2024
  3. Multimodal attention-driven visual question answering for Malayalam

    Visual question answering is a challenging task that necessitates for sophisticated reasoning over the visual elements to provide an accurate answer...

    Abhishek Gopinath Kovath, Anand Nayyar, O. K. Sikha in Neural Computing and Applications
    Article 10 May 2024
  4. Advanced Visual and Textual Co-context Aware Attention Network with Dependent Multimodal Fusion Block for Visual Question Answering

    Visual question answering (VQA) is a multimodal task requiring a simultaneous understanding of both visual and textual content. Therefore, image and...

    Hesam Shokri Asri, Reza Safabakhsh in Multimedia Tools and Applications
    Article 21 March 2024
  5. GVA: guided visual attention approach for automatic image caption generation

    Automated image caption generation with attention mechanisms focuses on visual features including objects, attributes, actions, and scenes of the...

    Md. Bipul Hossen, Zhongfu Ye, ... Md. Imran Hossain in Multimedia Systems
    Article 29 January 2024
  6. Masked co-attention model for audio-visual event localization

    The objective of Audio-Visual Event Localization (AVEL) is to leverage audio and video cues in a combined manner to localize video segments that...

    Hengwei Liu, **aodong Gu in Applied Intelligence
    Article 13 January 2024
  7. Dual visual align-cross attention-based image captioning transformer

    Region-based features widely used in image captioning are typically extracted using object detectors like Faster R-CNN. However, the approach has a...

    Yonggong Ren, **ghan Zhang, ... Dang N. H. Thanh in Multimedia Tools and Applications
    Article 17 May 2024
  8. IMCN: Improved modular co-attention networks for visual question answering

    Many existing Visual Question Answering (VQA) methods use traditional attention mechanisms to focus on each region of the input image and each word...

    Cheng Liu, Chao Wang, Yan Peng in Applied Intelligence
    Article 01 March 2024
  9. Video Captioning Based on Cascaded Attention-Guided Visual Feature Fusion

    Video captioning generation has become one of the research hotspots in recent years due to its wide range of potential application scenarios. It...

    Shuqin Chen, Li Yang, Yikang Hu in Neural Processing Letters
    Article 25 August 2023
  10. Multimodal Bi-direction Guided Attention Networks for Visual Question Answering

    Current visual question answering (VQA) has become a research hotspot in the computer vision and natural language processing field. A core solution...

    Linqin Cai, Nuoying Xu, ... Haodu Fan in Neural Processing Letters
    Article 13 September 2023
  11. Co-attention graph convolutional network for visual question answering

    Visual Question Answering (VQA) is a challenging task that requires a fine-grained understanding of both the visual content of images and the textual...

    Chuan Liu, Ying-Ying Tan, ... Ming Zhu in Multimedia Systems
    Article 20 June 2023
  12. Multistage attention region supplement transformer for fine-grained visual categorization

    The classification of fine-grained images using computer technology employs neural network models to distinguish between instances of different...

    Aokun Mei, Hua Huo, ... Ningya Xu in The Visual Computer
    Article 17 June 2024
  13. AS-Net: active speaker detection using deep audio-visual attention

    Active Speaker Detection (ASD) aims at identifying the active speaker among multiple speakers in a video scene. Previous ASD models often seek audio...

    Abduljalil Radman, Jorma Laaksonen in Multimedia Tools and Applications
    Article Open access 05 February 2024
  14. Hierarchical cross-modal contextual attention network for visual grounding

    This paper explores the task of visual grounding (VG), which aims to localize regions of an image through sentence queries. The development of VG has...

    **n Xu, Gang Lv, ... Fudong Nian in Multimedia Systems
    Article 17 April 2023
  15. Local self-attention in transformer for visual question answering

    Visual Question Answering (VQA) is a multimodal task that requires models to understand both textual and visual information. Various VQA models have...

    **ang Shen, Dezhi Han, ... Gaofeng Luo in Applied Intelligence
    Article 15 December 2022
  16. Dual-feature collaborative relation-attention networks for visual question answering

    Region and grid features extracted by object detection networks, which contain abundant image information, are widely used in visual question...

    Article 04 August 2023
  17. Graph attention network-optimized dynamic monocular visual odometry

    Monocular Visual Odometry (VO) is often formulated as a sequential dynamics problem that relies on scene rigidity assumption. One of the main...

    Zhao Hongru, Qiao **uquan in Applied Intelligence
    Article 05 July 2023
  18. Hierarchical Attention Networks for Fact-based Visual Question Answering

    Fact-based Visual Question Answering (FVQA) aims to answer questions with images and facts. It requires a fine-grained and simultaneous understanding...

    Haibo Yao, Yongkang Luo, ... Chengtao Cai in Multimedia Tools and Applications
    Article 22 July 2023
  19. Multi-scale network with shared cross-attention for audio–visual correlation learning

    Cross-modal audio–visual correlation learning has been an interesting research topic, which aims to capture and understand semantic correspondences...

    Jiwei Zhang, Yi Yu, ... Jianming Wu in Neural Computing and Applications
    Article 19 July 2023
  20. Cross-modal attention guided visual reasoning for referring image segmentation

    The goal of referring image segmentation (RIS) is to generate the foreground mask of the object described by a natural language expression. The key...

    Wen**g Zhang, Mengnan Hu, ... Rong Wang in Multimedia Tools and Applications
    Article 01 March 2023
Did you find what you were looking for? Share feedback.