Search
Search Results
-
See, move and hear: a local-to-global multi-modal interaction network for video action recognition
Visual and audio signals are concurrent and complementary types of modality in some video actions. A single visual modality limits the performance of...
-
CMC-MMR: multi-modal recommendation model with cross-modal correction
Multi-modal recommendation using multi-modal features (e.g., image and text features) has received significant attention and has been shown to have...
-
Intuitive Multi-modal Human-Robot Interaction via Posture and Voice
Collaborative robots promise to greatly improve the quality-of-life for the aging population and also easing elder care. However existing systems... -
MutualFormer: Multi-modal Representation Learning via Cross-Diffusion Attention
Aggregating multi-modal data to obtain reliable data representation attracts more and more attention. Recent studies demonstrate that Transformer...
-
Align vision-language semantics by multi-task learning for multi-modal summarization
Most current multi-modal summarization methods follow a cascaded manner, where an off-the-shelf object detector is first used to extract visual...
-
MADMM: microservice system anomaly detection via multi-modal data and multi-feature extraction
Accurately detecting anomalies in microservice systems is crucial to avoid system failures and economic losses for users. Existing approaches detect...
-
The effectiveness of children’s English enlightenment network teaching based on multi-modal teaching model
To enhance the efficacy of traditional English enlightenment education for children, this research delves into a multi-modal teaching approach and...
-
Unsupervised multi-perspective fusing semantic alignment for cross-modal hashing retrieval
Due to its low computational‘ cost, excellent storage capacity, and efficient retrieval performance, unsupervised deep cross-modal hashing methods...
-
Visual Question Generation Under Multi-granularity Cross-Modal Interaction
Visual question generation (VQG) aims to ask human-like questions automatically from input images targeting on given answers. A key issue of VQG is... -
Enhanced Entity Interaction Modeling for Multi-Modal Entity Alignment
Multi-modal Entity Alignment (MMEA) aims to find equivalent entities across different multi-modal knowledge graphs (MMKGs). Most existing methods... -
Multi-Modal Co-Attention Capsule Network for Fake News Detection
AbstractMost of the existing fake news identification models mainly focused on exploiting multi-modal features to enhanced performance recently. This...
-
Personalizing Multi-modal Human-Robot Interaction Using Adaptive Robot Behavior
Technology favors better life expectancy, changing the population distribution. This change is known as population aging and brings a lack of care... -
MEDMCN: a novel multi-modal EfficientDet with multi-scale CapsNet for object detection
Object detection in real-world scenarios with multi-modal inputs is crucial for some safety-critical systems, such as autonomous driving, security...
-
Multi-modal visual tracking: Review and experimental comparison
Visual object tracking has been drawing increasing attention in recent years, as a fundamental task in computer vision. To extend the range of...
-
Multi-modal bilinear fusion with hybrid attention mechanism for multi-label skin lesion classification
Skin cancer is one of the most prevalent malignancies in the world. Deep learning-based methods have been successfully used for skin disease...
-
Online video visual relation detection with hierarchical multi-modal fusion
With the development of artificial intelligence technology, visual scene understanding has become a hot research topic. Online visual relation...
-
MSTCNN: multi-modal spatio-temporal convolutional neural network for pedestrian trajectory prediction
Pedestrian trajectory prediction is of great significance in correctly planning a reasonable path. Most of the existing trajectory prediction methods...
-
PointCMC: cross-modal multi-scale correspondences learning for point cloud understanding
Existing cross-modal frameworks have achieved impressive performance in point cloud object representations learning, where a 2D image encoder is...
-
MCCP: multi-modal fashion compatibility and conditional preference model for personalized clothing recommendation
Personalized clothing recommendation remains challenging due to the richness of fashion item representations, the non-uniqueness of fashion...
-
Semantic enhancement and multi-level alignment network for cross-modal retrieval
Cross-modal retrieval aims to address heterogeneity and cross-modal semantic associations between multimedia data of different modalities. Image-text...