We are improving our search experience. To check which content you have full access to, or for advanced search, go back to the old search.

Search

Please fill in this field.

Search Results

Showing 41-60 of 10,000 results
  1. VISIONE 5.0: Enhanced User Interface and AI Models for VBS2024

    In this paper, we introduce the fifth release of VISIONE, an advanced video retrieval system offering diverse search functionalities. The user can...
    Giuseppe Amato, Paolo Bolettieri, ... Claudio Vairo in MultiMedia Modeling
    Conference paper 2024
  2. DiveXplore at the Video Browser Showdown 2024

    According to our experience from VBS2023 and the feedback from the IVR4B special session at CBMI2023, we have largely revised the diveXplore system...
    Klaus Schoeffmann, Sahar Nasirihaghighi in MultiMedia Modeling
    Conference paper 2024
  3. Contextual Augmentation with Bias Adaptive for Few-Shot Video Object Segmentation

    Few-shot video object segmentation (FSVOS) is a challenging task that aims to segment new object classes across query videos with limited annotated...
    Shuaiwei Wang, Zhao Liu, ... Ronghua Liang in MultiMedia Modeling
    Conference paper 2024
  4. Face Forgery Detection via Texture and Saliency Enhancement

    In recent years, AI-driven advancements have resulted in increasingly sophisticated face forgery techniques, posing a challenge in distinguishing...
    Sizheng Guo, Haozhe Yang, **anming Lin in MultiMedia Modeling
    Conference paper 2024
  5. Cross-Modal Semantic Alignment Learning for Text-Based Person Search

    Text-based person search aims to retrieve pedestrian images corresponding to a specific identity based on a textual description. Existing methods...
    Wenjun Gan, Jiawei Liu, ... Zheng-Jun Zha in MultiMedia Modeling
    Conference paper 2024
  6. Dive into Coarse-to-Fine Strategy in Single Image Deblurring

    The coarse-to-fine approach has gained significant popularity in the design of networks for single image deblurring. Traditional methods used to...
    Zebin Li, Jian** Luo in MultiMedia Modeling
    Conference paper 2024
  7. CLF-Net: A Few-Shot Cross-Language Font Generation Method

    Designing a font library takes a lot of time and effort. Few-shot font generation aims to generate a new font library by referring to only a few...
    Qianqian **, Fazhi He, Wei Tang in MultiMedia Modeling
    Conference paper 2024
  8. Advancing Incremental Few-Shot Semantic Segmentation via Semantic-Guided Relation Alignment and Adaptation

    Incremental few-shot semantic segmentation aims to extend a semantic segmentation model to novel classes according to only a few labeled data, while...
    Yuan Zhou, **n Chen, ... Qi Tian in MultiMedia Modeling
    Conference paper 2024
  9. MAVAR-SE: Multi-scale Audio-Visual Association Representation Network for End-to-End Speaker Extraction

    Speaker extraction to separate the target speech from the mixed audio is a problem worth studying in the speech separation field. Since human...
    Shilong Yu, Chenhui Yang in MultiMedia Modeling
    Conference paper 2024
  10. A Lightweight Local Attention Network for Image Super-Resolution

    For many years, deep neural networks have been used for Single Image Super-resolution (SISR) tasks. However, more extensive networks require higher...
    Feng Chen, **n Song, Liang Zhu in MultiMedia Modeling
    Conference paper 2024
  11. Find the Cliffhanger: Multi-modal Trailerness in Soap Operas

    Creating a trailer requires carefully picking out and piecing together brief enticing moments out of a longer video, making it a challenging and...
    Carlo Bretti, Pascal Mettes, ... Nanne van Noord in MultiMedia Modeling
    Conference paper 2024
  12. Dynamic-Static Graph Convolutional Network for Video-Based Facial Expression Recognition

    Most of the current methods for video-based facial expression recognition (FER) in the wild are based on deep neural networks with attention...
    Fahong Wang, Zhao Liu, ... Ronghua Liang in MultiMedia Modeling
    Conference paper 2024
  13. Multi-head Hashing with Orthogonal Decomposition for Cross-modal Retrieval

    Recently, cross-modal hashing has become a promising line of research in cross-modal retrieval. It not only takes advantage of complementary multiple...
    Wei Liu, Jun Li, ... Bo Yang in MultiMedia Modeling
    Conference paper 2024
  14. Multi-scale Decomposition Dehazing with Polarimetric Vision

    In this paper, the problem of simultaneous image dehazing of near and far scenes in hazy weather is addressed. We propose the multi-scale...
    Tongwei Ma, Lilian Zhang, ... Chen Fan in MultiMedia Modeling
    Conference paper 2024
  15. Super-Resolution-Assisted Feature Refined Extraction for Small Objects in Remote Sensing Images

    Despite achieving impressive results in object detection in natural scenes, the task of object detection in remote sensing images is still full of...
    Lihua Du, Wei Wu, Chen Li in MultiMedia Modeling
    Conference paper 2024
  16. Two-Stage Reasoning Network with Modality Decomposition for Text VQA

    Text-based Visual Question Answering (Text VQA) is a challenging task that requires a comprehensive understanding of scene texts in an image. Scene...
    Shengrong Ling, Sisi You, Bing-Kun Bao in MultiMedia Modeling
    Conference paper 2024
  17. Localization and Local Motion Magnification of Pulsatile Regions in Endoscopic Surgery Videos

    Localization of neurovascular bundles or vessels is critical in endoscopic surgery. It still remains challenging to identify neurovascular bundles...
    Honglei Zheng, Wenkang Fan, ... **ongbiao Luo in MultiMedia Modeling
    Conference paper 2024
  18. A Coarse and Fine Grained Masking Approach for Video-Grounded Dialogue

    The task of Video-Grounded Dialogue involves develo** a multimodal chatbot capable of answering sequential questions from humans regarding video...
    Feifei Xu, Wang Zhou, ... Guangzhen Li in MultiMedia Modeling
    Conference paper 2024
  19. Co-speech Gesture Generation with Variational Auto Encoder

    The research field of generating natural gestures from speech input is called co-speech gesture generation. Co-speech generation methods should...
    Shinichi Ka, Koichi Shinoda in MultiMedia Modeling
    Conference paper 2024
  20. LRATNet: Local-Relationship-Aware Transformer Network for Table Structure Recognition

    Table structure recognition is a challenging task due to complex background and various styles of tables. Existing methods address this challenge by...
    Guangjie Yang, Dajian Zhong, ... Hongjian Zhan in MultiMedia Modeling
    Conference paper 2024
Did you find what you were looking for? Share feedback.