Search
Search Results
-
MoPE: Mixture of Pooling Experts Framework for Image-Text Retrieval
Image-text retrieval is a fundamental and crucial task in the field of multimodal interaction, which assists internet users in retrieving the... -
Differentiable Neural Architecture Search Based on Efficient Architecture for Lightweight Image Super-Resolution
With the advancement of deep neural networks, image Super-Resolution (SR) has witnessed remarkable improvements in performance. However, the... -
A Language-Based Solution to Enable Metaverse Retrieval
Recently, the Metaverse is becoming increasingly attractive, with millions of users accessing the many available virtual worlds. However, how do... -
Sustainable Commercial Fishery Control Using Multimedia Forensics Data from Non-trusted, Mobile Edge Nodes
Uncontrolled over-fishing has been exemplified by the UN as a serious ecological challenge and a major threat to sustainable food supplies. Emerging... -
A Region Based Non-overlap** Reference Speech Estimation Method for Speaker Extraction
Speaker extraction is a technique that separates the target speech from multi-talker mixtures using a priori information about the target speaker,... -
Multi-modal Video Topic Segmentation with Dual-Contrastive Domain Adaptation
Video topic segmentation unveils the coarse-grained semantic structure underlying videos and is essential for other video understanding tasks. Given... -
Audio-Visual Segmentation by Leveraging Multi-scaled Features Learning
Audio-visual segmentation with semantics (AVSS) is an advanced approach that enriches Audio-visual segmentation (AVS) by incorporating object... -
SM-GAN: Single-Stage and Multi-object Text Guided Image Editing
In recent years, text-guided scene image manipulation has received extensive attention in the computer vision community. Most of the existing... -
A Secure and Fair Federated Learning Protocol Under the Universal Composability Framework
Federated learning is a paradigm of distributed machine learning that enables multiple participants to collaboratively train a global model while... -
NearbyPatchCL: Leveraging Nearby Patches for Self-supervised Patch-Level Multi-class Classification in Whole-Slide Images
Whole-slide image (WSI) analysis plays a crucial role in cancer diagnosis and treatment. In addressing the demands of this critical task,... -
Joint Image Data Hiding and Rate-Distortion Optimization in Neural Compressed Latent Representations
We present an end-to-end learned image data hiding framework that embeds and extracts secrets in the latent representations of a neural compressor.... -
Hierarchical Supervised Contrastive Learning for Multimodal Sentiment Analysis
Multimodal sentiment analysis (MSA) is dedicated to deciphering human emotions in videos. It is a challenging task due to the semantic disparities... -
Semantic Importance-Based Deep Image Compression Using a Generative Approach
Semantic image compression can greatly reduce the amount of transmitted data by representing and reconstructing images using semantic information.... -
SEAS-Net: Segment Exchange Augmentation for Semi-supervised Brain Tumor Segmentation
Accurate segmentation of brain tumors is crucial for cancer diagnosis, treatment planning, and evaluation. However, semi-supervised brain tumor image... -
MRHF: Multi-stage Retrieval and Hierarchical Fusion for Textbook Question Answering
Textbook question answering is challenging as it aims to automatically answer various questions on textbook lessons with long text and complex... -
Multi-task Collaborative Network for Image-Text Retrieval
Image-text retrieval aims to capture semantic relevance between images and texts. Most existing approaches rely solely on the image-text pairs to... -
LigCDnet:Remote Sensing Image Cloud Detection Based on Lightweight Framework
Cloud contamination is inevitable in remote sensing images, resulting in a large number of images that cannot be applied in various fields.... -
Unsupervised Multi-collaborative Learning Network for 3D Face Reconstruction
Monocular image-based 3D fine face reconstruction techniques aim to reconstruct 3D faces with rich face details from a single image. Existing methods... -
FGENet: Fine-Grained Extraction Network for Congested Crowd Counting
Crowd counting has gained significant popularity due to its practical applications. However, mainstream counting methods ignore precise individual... -
Pseudo-label Based Unsupervised Momentum Representation Learning for Multi-domain Image Retrieval
Although many current cross-domain image retrieval researches have made good progress, most of the works is targeted at specific domains. At the same...