Search Results - Springer

Sort By Newest First Oldest First

Article

Hugs Bring Double Benefits: Unsupervised Cross-Modal Hashing with Multi-granularity Aligned Transformers

Unsupervised cross-modal hashing (UCMH) has been commonly explored to support large-scale cross-modal retrieval of unlabeled data. Despite promising progress, most existing approaches are developed on convolut...

**peng Wang, Ziyun Zeng, Bin Chen, Yuting Wang… in International Journal of Computer Vision (2024)
Article

Softmax-Free Linear Transformers

Vision transformers (ViTs) have pushed the state-of-the-art for visual perception tasks. The self-attention mechanism underpinning the strength of ViTs has a quadratic complexity in both computation and memory...

Jiachen Lu, Junge Zhang, **atian Zhu… in International Journal of Computer Vision (2024)
Article

Does Confusion Really Hurt Novel Class Discovery?

When sampling data of specific classes (i.e., known classes) for a scientific task, collectors may encounter unknown classes (i.e., novel classes). Since these novel classes might be valuable for future research,...

Haoang Chi, Wen**g Yang, Feng Liu, Long Lan… in International Journal of Computer Vision (2024)
Article

Diff-Font: Diffusion Model for Robust One-Shot Font Generation

Font generation presents a significant challenge due to the intricate details needed, especially for languages with complex ideograms and numerous characters, such as Chinese and Korean. Although various few-s...

Haibin He, **nyuan Chen, Chaoyue Wang… in International Journal of Computer Vision (2024)
Article

Grounded Affordance from Exocentric View

Affordance grounding aims to locate objects’ “action possibilities” regions, an essential step toward embodied intelligence. Due to the diversity of interactive affordance, i.e., the uniqueness of different indiv...

Hongchen Luo, Wei Zhai, **g Zhang, Yang Cao… in International Journal of Computer Vision (2024)
Article

Delving into Identify-Emphasize Paradigm for Combating Unknown Bias

Dataset biases are notoriously detrimental to model robustness and generalization. The identify-emphasize paradigm appears to be effective in dealing with unknown biases. However, we discover that it is still ...

Bowen Zhao, Chen Chen, Qian-Wei Wang, Anfeng He… in International Journal of Computer Vision (2024)
Article

Towards Defending Multiple \(\ell _p\) -Norm Bounded Adversarial Perturbations via Gated Batch Normalization

There has been extensive evidence demonstrating that deep neural networks are vulnerable to adversarial examples, which motivates the development of defenses against adversarial attacks. Existing adversarial d...

Aishan Liu, Shiyu Tang, **nyun Chen, Lei Huang… in International Journal of Computer Vision (2024)
Article

GridFormer: Residual Dense Transformer with Grid Structure for Image Restoration in Adverse Weather Conditions

Image restoration in adverse weather conditions is a difficult task in computer vision. In this paper, we propose a novel transformer-based framework called GridFormer which serves as a backbone for image rest...

Tao Wang, Kaihao Zhang, Ziqian Shao, Wenhan Luo… in International Journal of Computer Vision (2024)
Article

Convex–Concave Tensor Robust Principal Component Analysis

Tensor robust principal component analysis (TRPCA) aims at recovering the underlying low-rank clean tensor and residual sparse component from the observed tensor. The recovery quality heavily depends on the de...

Youfa Liu, Bo Du, Yongyong Chen, Lefei Zhang… in International Journal of Computer Vision (2024)
Article

Diagram Perception Networks for Textbook Question Answering via Joint Optimization

Textbook question answering requires a system to answer questions with or without diagrams accurately, given multimodal contexts that include rich paragraphs and diagrams. Existing methods usually utilize a pi...

Jie Ma, Jun Liu, Qi Chai, **hui Wang… in International Journal of Computer Vision (2024)
Article

Robust Unpaired Image Dehazing via Density and Depth Decomposition

To overcome the overfitting issue of dehazing models trained on synthetic hazy-clean image pairs, recent methods attempt to boost the generalization ability by training on unpaired data. However, most of exist...

Yang Yang, Chaoyue Wang, **aojie Guo… in International Journal of Computer Vision (2024)
Article

VNAS: Variational Neural Architecture Search

Differentiable neural architecture search delivers point estimation to the optimal architecture, which yields arbitrarily high confidence to the learned architecture. This approach thus suffers in calibration ...

Benteng Ma, **g Zhang, Yong **a, Dacheng Tao in International Journal of Computer Vision (2024)
Article

EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm

Motivated by biological evolution, this paper explains the rationality of Vision Transformer by analogy with the proven practical evolutionary algorithm (EA) and derives that both have consistent mathematical ...

Jiangning Zhang, **angtai Li, Yabiao Wang… in International Journal of Computer Vision (2024)
Article

MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis

Existing multimodal conditional image synthesis (MCIS) methods generate images conditioned on any combinations of various modalities that require all of them must be exactly conformed, hindering the synthesis ...

Jianbin Zheng, Daqing Liu, Chaoyue Wang… in International Journal of Computer Vision (2024)
Article

MixStyle Neural Networks for Domain Generalization and Adaptation

Neural networks do not generalize well to unseen data with domain shifts—a longstanding problem in machine learning and AI. To overcome the problem, we propose MixStyle, a simple plug-and-play, parameter-free ...

Kaiyang Zhou, Yongxin Yang, Yu Qiao, Tao **ang in International Journal of Computer Vision (2024)
Article

Open Access

Sfnet: Faster and Accurate Semantic Segmentation Via Semantic Flow

In this paper, we focus on exploring effective methods for faster and accurate semantic segmentation. A common practice to improve the performance is to attain high-resolution feature maps with strong semantic...

**angtai Li, Jiangning Zhang, Yibo Yang… in International Journal of Computer Vision (2024)

Download PDF (3299 KB) View Article
Chapter and Conference Paper

MetaVSR: A Novel Approach to Video Super-Resolution for Arbitrary Magnification

Video super-resolution is a pivotal task that involves the recovery of high-resolution video frames from their low-resolution counterparts, possessing a multitude of applications in real-world scenarios. Withi...

Zixuan Hong, Weipeng Cao, Zhiwu Xu, Zhenru Chen, ** Tao, Zhong Ming… in MultiMedia Modeling (2024)
Chapter and Conference Paper

A Coarse and Fine Grained Masking Approach for Video-Grounded Dialogue

The task of Video-Grounded Dialogue involves develo** a multimodal chatbot capable of answering sequential questions from humans regarding video content, audio, captions and dialog history. Although existing...

Feifei Xu, Wang Zhou, Tao Sun, Jiahao Lu, Ziheng Yu, Guangzhen Li in MultiMedia Modeling (2024)
Chapter and Conference Paper

High Capacity Reversible Data Hiding in Encrypted Images Based on Pixel Value Preprocessing and Block Classification

Reversible data hiding in encrypted images (RDHEI) can simultaneously achieve secure transmission of images and secret storage of embedded additional data, which can be used for cloud storage and privacy prote...

Tao Zhang, Ju Zhang, Yicheng Zou, Yu Zhang in MultiMedia Modeling (2024)
Article

Attribute-Image Person Re-identification via Modal-Consistent Metric Learning

Attribute-image person re-identification (AIPR) is a cross-modal retrieval task that searches person images who meet a list of attributes. Due to large modal gaps between attributes and images, current AIPR me...

Jianqing Zhu, Liu Liu, Yibing Zhan, **aobin Zhu… in International Journal of Computer Vision (2023)

136 Result(s)

Hugs Bring Double Benefits: Unsupervised Cross-Modal Hashing with Multi-granularity Aligned Transformers

Softmax-Free Linear Transformers

Does Confusion Really Hurt Novel Class Discovery?

Diff-Font: Diffusion Model for Robust One-Shot Font Generation

Grounded Affordance from Exocentric View

Delving into Identify-Emphasize Paradigm for Combating Unknown Bias

Towards Defending Multiple \(\ell _p\) -Norm Bounded Adversarial Perturbations via Gated Batch Normalization

GridFormer: Residual Dense Transformer with Grid Structure for Image Restoration in Adverse Weather Conditions

Convex–Concave Tensor Robust Principal Component Analysis

Diagram Perception Networks for Textbook Question Answering via Joint Optimization

Robust Unpaired Image Dehazing via Density and Depth Decomposition

VNAS: Variational Neural Architecture Search

EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm

MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis

MixStyle Neural Networks for Domain Generalization and Adaptation

Sfnet: Faster and Accurate Semantic Segmentation Via Semantic Flow

MetaVSR: A Novel Approach to Video Super-Resolution for Arbitrary Magnification

A Coarse and Fine Grained Masking Approach for Video-Grounded Dialogue

High Capacity Reversible Data Hiding in Encrypted Images Based on Pixel Value Preprocessing and Block Classification

Attribute-Image Person Re-identification via Modal-Consistent Metric Learning

Our Content

Other Sites

Help & Contacts