Search Page | SpringerLink

See, move and hear: a local-to-global multi-modal interaction network for video action recognition

Visual and audio signals are concurrent and complementary types of modality in some video actions. A single visual modality limits the performance of...

Fan Feng, Yue Ming, ... Jiangwan Zhou in Applied Intelligence

Article 15 March 2023

CMC-MMR: multi-modal recommendation model with cross-modal correction

Multi-modal recommendation using multi-modal features (e.g., image and text features) has received significant attention and has been shown to have...

YuBin Wang, HongBin **a, Yuan Liu in Journal of Intelligent Information Systems

Article 20 February 2024

Intuitive Multi-modal Human-Robot Interaction via Posture and Voice

Collaborative robots promise to greatly improve the quality-of-life for the aging population and also easing elder care. However existing systems...

Yuzhi Lai, Mario Radke, ... Matthias Rätsch in Robotics, Computer Vision and Intelligent Systems

Conference paper 2024

MutualFormer: Multi-modal Representation Learning via Cross-Diffusion Attention

Aggregating multi-modal data to obtain reliable data representation attracts more and more attention. Recent studies demonstrate that Transformer...

**xi Wang, **ao Wang, ... Bin Luo in International Journal of Computer Vision

Article 24 April 2024

Align vision-language semantics by multi-task learning for multi-modal summarization

Most current multi-modal summarization methods follow a cascaded manner, where an off-the-shelf object detector is first used to extract visual...

Chenhao Cui, **nnian Liang, ... Zhoujun Li in Neural Computing and Applications

Article 17 May 2024

MADMM: microservice system anomaly detection via multi-modal data and multi-feature extraction

Accurately detecting anomalies in microservice systems is crucial to avoid system failures and economic losses for users. Existing approaches detect...

Peipeng Wang, **uguo Zhang, ... Zihan Chen in Neural Computing and Applications

Article 18 May 2024

The effectiveness of children’s English enlightenment network teaching based on multi-modal teaching model

To enhance the efficacy of traditional English enlightenment education for children, this research delves into a multi-modal teaching approach and...

Lan Zhang in Service Oriented Computing and Applications

Article 16 May 2024

Unsupervised multi-perspective fusing semantic alignment for cross-modal hashing retrieval

Due to its low computational‘ cost, excellent storage capacity, and efficient retrieval performance, unsupervised deep cross-modal hashing methods...

Yongfeng Chen, Junpeng Tan, ... **ghui Qin in Multimedia Tools and Applications

Article 09 January 2024

Visual Question Generation Under Multi-granularity Cross-Modal Interaction

Visual question generation (VQG) aims to ask human-like questions automatically from input images targeting on given answers. A key issue of VQG is...

Zi Chai, **aojun Wan, ... Josiah Poon in MultiMedia Modeling

Conference paper 2023

Enhanced Entity Interaction Modeling for Multi-Modal Entity Alignment

Multi-modal Entity Alignment (MMEA) aims to find equivalent entities across different multi-modal knowledge graphs (MMKGs). Most existing methods...

**xu Li, Qian Zhou, ... Lei Zhao in Knowledge Science, Engineering and Management

Conference paper 2023

Multi-Modal Co-Attention Capsule Network for Fake News Detection

Abstract

Most of the existing fake news identification models mainly focused on exploiting multi-modal features to enhanced performance recently. This...

Chunyan Yin, Yongheng Chen in Optical Memory and Neural Networks

Article 25 March 2024

Personalizing Multi-modal Human-Robot Interaction Using Adaptive Robot Behavior

Technology favors better life expectancy, changing the population distribution. This change is known as population aging and brings a lack of care...

Marcos Maroto-Gómez, Allison Huisa-Rojas, ... Miguel Ángel Salichs in Social Robotics

Conference paper 2024

MEDMCN: a novel multi-modal EfficientDet with multi-scale CapsNet for object detection

Object detection in real-world scenarios with multi-modal inputs is crucial for some safety-critical systems, such as autonomous driving, security...

**ngye Li, ** Liu, ... Zhongdai Wu in The Journal of Supercomputing

Article 23 February 2024

Multi-modal visual tracking: Review and experimental comparison

Visual object tracking has been drawing increasing attention in recent years, as a fundamental task in computer vision. To extend the range of...

Pengyu Zhang, Dong Wang, Huchuan Lu in Computational Visual Media

Article Open access 03 January 2024

Multi-modal bilinear fusion with hybrid attention mechanism for multi-label skin lesion classification

Skin cancer is one of the most prevalent malignancies in the world. Deep learning-based methods have been successfully used for skin disease...

Yun Wei, Lin Ji in Multimedia Tools and Applications

Article 15 January 2024

Online video visual relation detection with hierarchical multi-modal fusion

With the development of artificial intelligence technology, visual scene understanding has become a hot research topic. Online visual relation...

Yuxuan He, Ming-Gang Gan, Qianzhao Ma in Multimedia Tools and Applications

Article 18 January 2024

MSTCNN: multi-modal spatio-temporal convolutional neural network for pedestrian trajectory prediction

Pedestrian trajectory prediction is of great significance in correctly planning a reasonable path. Most of the existing trajectory prediction methods...

Haifeng Sang, Wangxing Chen, ... **yu Wang in Multimedia Tools and Applications

Article 18 June 2023

PointCMC: cross-modal multi-scale correspondences learning for point cloud understanding

Existing cross-modal frameworks have achieved impressive performance in point cloud object representations learning, where a 2D image encoder is...

Honggu Zhou, **aogang Peng, ... Zizhao Wu in Multimedia Systems

Article 30 April 2024

MCCP: multi-modal fashion compatibility and conditional preference model for personalized clothing recommendation

Personalized clothing recommendation remains challenging due to the richness of fashion item representations, the non-uniqueness of fashion...

Yunzhu Wang, Li Liu, ... Lijun Liu in Multimedia Tools and Applications

Article 24 June 2023

Semantic enhancement and multi-level alignment network for cross-modal retrieval

Cross-modal retrieval aims to address heterogeneity and cross-modal semantic associations between multimedia data of different modalities. Image-text...

Jia Chen, Hong Zhang in Multimedia Tools and Applications

Article 12 January 2024

Search

Filters

Search Results

Search

Navigation