Search Page | SpringerLink

Enhancing Semantics-Driven Recommender Systems with Visual Features

Content-based semantics-driven recommender systems are often used in the small-scale news recommendation domain, founded on the TF-IDF measure but...

Mounir M. Bendouch, Flavius Frasincar, Tarmo Robal in Advanced Information Systems Engineering

Conference paper 2022

FaSRnet: a feature and semantics refinement network for human pose estimation

Due to factors such as motion blur, video out-of-focus, and occlusion, multi-frame human pose estimation is a challenging task. Exploiting temporal...

Yuanhong Zhong, Qianfeng Xu, ... Shanshan Wang in Frontiers of Information Technology & Electronic Engineering

Article 01 April 2024

Enhancing Visual Question Answering with Generated Image Caption

Visual Question Answering (VQA) poses a formidable challenge, necessitating computer systems to proficiently execute essential computer vision tasks,...

Kieu-Anh Thi Truong, Truong-Thuy Tran, ... Duc-Trong Le in Computational Data and Social Networks

Conference paper 2024

A Cross-Modal View to Utilize Label Semantics for Enhancing Student Network in Multi-label Classification

Knowledge transfer has become a promising approach for improving the performance and efficiency of relatively lightweight networks. Previous research...

Yuzhuo Qin, Hengwei Liu, **aodong Gu in Artificial Neural Networks and Machine Learning – ICANN 2023

Conference paper 2023

Visual and language semantic hybrid enhancement and complementary for video description

It is a fundamental task of computer vision to describe and express the visual content of a video in natural language, which not only highly...

Pengjie Tang, Yunlan Tan, Wenlang Luo in Neural Computing and Applications

Article 20 January 2022

Zero-shot image classification via Visual–Semantic Feature Decoupling

Zero-shot image classification refers to the use of labeled images to train a classification model that can correctly classify images of unseen...

**n Sun, Yu Tian, Haojie Li in Multimedia Systems

Article 15 March 2024

Audio-Visual Segmentation by Leveraging Multi-scaled Features Learning

Audio-visual segmentation with semantics (AVSS) is an advanced approach that enriches Audio-visual segmentation (AVS) by incorporating object...

Sze An Peter Tan, Guangyu Gao, Jia Zhao in MultiMedia Modeling

Conference paper 2024

Enhancing Fairness of Visual Attribute Predictors

The performance of deep neural networks for image recognition tasks such as predicting a smiling face is known to degrade with under-represented...

Tobias Hänel, Nishant Kumar, ... Stefan Gumhold in Computer Vision – ACCV 2022

Conference paper 2023

Mutually guided learning of global semantics and local representations for image restoration

The global semantics and the local scene representation are crucial for image restoration. Although existing methods have proposed various hybrid...

Yuanshuo Cheng, Mingwen Shao, Yecong Wan in Multimedia Tools and Applications

Article 14 September 2023

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking

Masked Autoencoders (MAE) have been popular paradigms for large-scale vision representation pre-training. However, MAE solely reconstructs the...

Peng Gao, Ziyi Lin, ... Yu Qiao in International Journal of Computer Vision

Article 24 November 2023

Multi-granularity hypergraph-guided transformer learning framework for visual classification

Fine-grained single-label classification tasks aim to distinguish highly similar categories but often overlook inter-category relationships....

Jianjian Jiang, Ziwei Chen, ... **aochen Yuan in The Visual Computer

Article 28 June 2024

End-to-End Image Compression Through Machine Semantics

With the increasing demand for AI automated analysis, machine semantics have replaced signals as a new focus in visual information compression. In...

Jianran Liu, Chang Zhang, Wen Ji in Digital Multimedia Communications

Conference paper 2024

Lgvc: language-guided visual context modeling for 3D visual grounding

3D visual grounding is crucial for understanding cross-modal scenes, linking visual objects to their corresponding language descriptions. Traditional...

Liang Geng, Jianqin Yin, Yingchun Niu in Neural Computing and Applications

Article 23 April 2024

SLOD2+WIN: semantics-aware addition and LoD of 3D window details for LoD2 CityGML models with textures

In many urban planning and visualization applications, it is crucial to have 3D window details. However, the process of acquiring and reconstructing...

**ngzi Zhang, Kan Chen, ... Marius Erdt in The Visual Computer

Article 15 March 2024

Contrastive learning for unsupervised sentence embeddings using negative samples with diminished semantics

Unsupervised learning has made significant progress in recent years, driven by advancements in contrastive learning. However, current methods for...

Zhiyi Yu, Hong Li, Jialin Feng in The Journal of Supercomputing

Article 27 September 2023

GViG: Generative Visual Grounding Using Prompt-Based Language Modeling for Visual Question Answering

The WSDM 2023 Toloka VQA challenge introduces a new Grounding-based Visual Question Answering (GVQA) dataset, elevating multimodal task complexity....

Yi-Ting Li, Ying-Jia Lin, ... Hung-Yu Kao in Advances in Knowledge Discovery and Data Mining

Conference paper 2024

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval

Dominant pre-training work for video-text retrieval mainly adopt the “dual-encoder” architectures to enable efficient retrieval, where two separate...

Yuying Ge, Yixiao Ge, ... ** Luo in Computer Vision – ECCV 2022

Conference paper 2022

Unimodal-Multimodal Collaborative Enhancement for Audio-Visual Event Localization

Audio-visual event localization (AVE) task focuses on localizing audio-visual events where event signals occur in both audio and visual modalities....

Huilin Tian, **gke Meng, ... Weishi Zheng in Pattern Recognition and Computer Vision

Conference paper 2024

An Effective Pre-trained Visual Encoder for Medical Visual Question Answering

Medical Visual Question Answering (Med-VQA) is a domain-specific task that answers a given clinical question regarding a radiology image. It requires...

Yefan Huang, **aoli Wang, **song Su in Advanced Data Mining and Applications

Conference paper 2023

Indirect visual–semantic alignment for generalized zero-shot recognition

Our paper addresses the challenge of generalized zero-shot learning, where the label of a target image may belong to either a seen or an unseen...

Yan-He Chen, Mei-Chen Yeh in Multimedia Systems

Article 03 April 2024

Search

Filters

Search Results

Search

Navigation