Search Results - Springer

Article

Progressive spatial–temporal transfer model for unsupervised person re-identification

Over the past decade, a more widespread area of computer vision research has been person re-identification (P-Reid). This technology is applied in fields such as pedestrian tracking, security, and video survei...

Shuren Zhou, Zhixiong Li, Jie Liu… in International Journal of Multimedia Inform… (2024)

Article

Parameter-efficient tuning of cross-modal retrieval for a specific database via trainable textual and visual prompts

A novel cross-modal image retrieval method realized by parameter efficiently tuning a pre-trained cross-modal model is proposed in this study. Conventional cross-modal retrieval methods realize text-to-image r...

Huaying Zhang, Rintaro Yanagi, Ren Togo… in International Journal of Multimedia Inform… (2024)

Article

PSNet: position-shift alignment network for image caption

Recently, Transformer-based models have gained increasing popularity in the field of image captioning. The global attention mechanism of the Transformer facilitates the integration of region and grid features,...

Lixia Xue, Awen Zhang, Ronggui Wang… in International Journal of Multimedia Inform… (2023)

Article

A lightweight small object detection algorithm based on improved YOLOv5 for driving scenarios

Small object detection has been a longstanding challenge in the field of object detection, and achieving high detection accuracy is crucial for autonomous driving, especially for small objects. This article fo...

Zonghui Wen, Jia Su, Yongxiang Zhang… in International Journal of Multimedia Inform… (2023)

Article

CoCoOpter: Pre-train, prompt, and fine-tune the vision-language model for few-shot image classification

Few-shot image classification aims at learning to generalize to unseen new categories from a few training samples. Transfer learning is one prominent approach to the task, which first learns a backbone from th...

Jie Yan, Yuxiang **e, Yanming Guo… in International Journal of Multimedia Inform… (2023)

Article

Style-aware adversarial pairwise ranking for image recommendation systems

The vulnerability of Machine Learning (ML) models to adversarial attack and their prominence pose security issues, notably in image recommendation systems. The adversarial training method is an excellent strat...

Zhefu Wu, Song Zhang, Agyemang Paul… in International Journal of Multimedia Inform… (2023)

Chapter and Conference Paper

CMC_v2: Towards More Accurate COVID-19 Detection with Discriminative Video Priors

This paper presents our solution for the 2nd COVID-19 Competition, occurring in the framework of the AIMIA Workshop at the European Conference on Computer Vision (ECCV 2022). In our approach, we employ the win...

Junlin Hou, Jilan Xu, Nan Zhang, Yi Wang… in Computer Vision – ECCV 2022 Workshops (2023)

Chapter and Conference Paper

Boosting COVID-19 Severity Detection with Infection-Aware Contrastive Mixup Classification

This paper presents our solution for the 2nd COVID-19 Severity Detection Competition. This task aims to distinguish the Mild, Moderate, Severe, and Critical grades in COVID-19 chest CT images. In our approach,...

Junlin Hou, Jilan Xu, Nan Zhang, Yuejie Zhang… in Computer Vision – ECCV 2022 Workshops (2023)

Chapter and Conference Paper

BiTAT: Neural Network Binarization with Task-Dependent Aggregated Transformation

Neural network quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation, while preserving ...

Geon Park, Jaehong Yoon, Haiyang Zhang, **ng Zhang… in Computer Vision – ECCV 2022 Workshops (2023)

Chapter and Conference Paper

BadDet: Backdoor Attacks on Object Detection

Backdoor attack is a severe security threat which injects a backdoor trigger into a small portion of training data such that the trained model gives incorrect predictions when the specific trigger appears. Whi...

Shih-Han Chan, Yinpeng Dong, Jun Zhu, **aolu Zhang… in Computer Vision – ECCV 2022 Workshops (2023)

Chapter and Conference Paper

Hydra Attention: Efficient Attention with Many Heads

While transformers have begun to dominate many tasks in vision, applying them to large images is still computationally difficult. A large reason for this is that self-attention scales quadratically with the nu...

Daniel Bolya, Cheng-Yang Fu, **aoliang Dai… in Computer Vision – ECCV 2022 Workshops (2023)

Chapter and Conference Paper

An Improved Lightweight Network Based on YOLOv5s for Object Detection in Autonomous Driving

Object detection with high accuracy and fast inference speed based on camera sensors is important for autonomous driving. This paper develops a lightweight object detection network based on YOLOv5s which is on...

Guofa Li, Yingjie Zhang, Delin Ouyang, **ngda Qu in Computer Vision – ECCV 2022 Workshops (2023)

Chapter and Conference Paper

RPR-Net: A Point Cloud-Based Rotation-Aware Large Scale Place Recognition Network

Point cloud-based large scale place recognition is an important but challenging task for many applications such as Simultaneous Localization and Map** (SLAM). Taking the task as a point cloud retrieval probl...

Zhaoxin Fan, Zhenbo Song, Wen** Zhang… in Computer Vision – ECCV 2022 Workshops (2023)

Article

Prototype local–global alignment network for image–text retrieval

Image–text retrieval is a challenging task due to the requirement of thorough multimodal understanding and precise inter-modality relationship discovery. However, most previous approaches resort to doing globa...

Lingtao Meng, Feifei Zhang, ** Zhang… in International Journal of Multimedia Inform… (2022)

Article

FCT: fusing CNN and transformer for scene classification

Scene classification based on convolutional neural networks (CNNs) has achieved great success in recent years. In CNNs, the convolution operation performs well in extracting local features, but its ability to ...

Yuxiang **e, Jie Yan, Lai Kang, Yanming Guo… in International Journal of Multimedia Inform… (2022)

Article

Your heart rate betrays you: multimodal learning with spatio-temporal fusion networks for micro-expression recognition

Micro-expressions can convey feelings that people are trying to hide. At present, some studies on micro-expression, most of which only use the temporal or spatial information in the image to recognize micro-ex...

Ren Zhang, Ning He, Shengjie Liu, Ying Wu… in International Journal of Multimedia Inform… (2022)

Article

TCKGE: Transformers with contrastive learning for knowledge graph embedding

Representation learning of knowledge graphs has emerged as a powerful technique for various downstream tasks. In recent years, numerous research efforts have been made for knowledge graphs embedding. However, ...

**aowei Zhang, Quan Fang, Jun Hu… in International Journal of Multimedia Inform… (2022)

Article

Multi-aware coreference relation network for visual dialog

As a challenging cross-media task, visual dialog assesses whether an AI agent can converse in human language based on its understanding of visual content. So the critical issue is to pay attention not only to ...

Zefan Zhang, Tianling Jiang, Chun** Liu… in International Journal of Multimedia Inform… (2022)

Chapter and Conference Paper

Chair Design of Waiting Space in Maternity Department Based on QFD-Kano and FBS

In order to design the seats in the waiting space of the obstetrics and gynecology department of the hospital, reduce the anxiety of pregnant women waiting for the inspection process, objectively and rationall...

Lin Zhang, Zhang Zhang in HCI International 2022 Posters (2022)

Chapter and Conference Paper

TransVLAD: Focusing on Locally Aggregated Descriptors for Few-Shot Learning

This paper presents a transformer framework for few-shot learning, termed TransVLAD, with one focus showing the power of locally aggregated descriptors for few-shot learning. Our TransVLAD model is simple: a s...

Haoquan Li, Laoming Zhang, Daoan Zhang, Lang Fu, Peng Yang… in Computer Vision – ECCV 2022 (2022)

1,803 Result(s)

Progressive spatial–temporal transfer model for unsupervised person re-identification

Parameter-efficient tuning of cross-modal retrieval for a specific database via trainable textual and visual prompts

PSNet: position-shift alignment network for image caption

A lightweight small object detection algorithm based on improved YOLOv5 for driving scenarios

CoCoOpter: Pre-train, prompt, and fine-tune the vision-language model for few-shot image classification

Style-aware adversarial pairwise ranking for image recommendation systems

CMC_v2: Towards More Accurate COVID-19 Detection with Discriminative Video Priors

Boosting COVID-19 Severity Detection with Infection-Aware Contrastive Mixup Classification

BiTAT: Neural Network Binarization with Task-Dependent Aggregated Transformation

BadDet: Backdoor Attacks on Object Detection

Hydra Attention: Efficient Attention with Many Heads

An Improved Lightweight Network Based on YOLOv5s for Object Detection in Autonomous Driving

RPR-Net: A Point Cloud-Based Rotation-Aware Large Scale Place Recognition Network

Prototype local–global alignment network for image–text retrieval

FCT: fusing CNN and transformer for scene classification

Your heart rate betrays you: multimodal learning with spatio-temporal fusion networks for micro-expression recognition

TCKGE: Transformers with contrastive learning for knowledge graph embedding

Multi-aware coreference relation network for visual dialog

Chair Design of Waiting Space in Maternity Department Based on QFD-Kano and FBS

TransVLAD: Focusing on Locally Aggregated Descriptors for Few-Shot Learning

Our Content

Other Sites

Help & Contacts