![Loading...](https://link.springer.com/static/c4a417b97a76cc2980e3c25e2271af3129e08bbe/images/pdf-preview/spacer.gif)
-
Chapter and Conference Paper
Descriptive Attributes for Language-Based Object Keypoint Detection
Multimodal vision and language (VL) models have recently shown strong performance in phrase grounding and object detection for both zero-shot and finetuned cases. We adapt a VL model (GLIP) for keypoint detect...
-
Chapter and Conference Paper
Text-Driven Stylization of Video Objects
We tackle the task of stylizing video objects in an intuitive and semantic manner following a user-specified text prompt. This is a challenging task as the resulting video must satisfy multiple properties: (1)...
-
Chapter and Conference Paper
SITTA: Single Image Texture Translation for Data Augmentation
Recent advances in data augmentation enable one to translate images by learning the map** between a source domain and a target domain. Existing methods tend to learn the distributions by training a model on ...
-
Chapter and Conference Paper
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
We present a new benchmark dataset, Sapsucker Woods 60 (SSW60), for advancing research on audiovisual fine-grained categorization. While our community has made great strides in fine-grained visual categorizati...
-
Chapter and Conference Paper
Visual Prompt Tuning
The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, i.e., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alte...
-
Chapter and Conference Paper
On Label Granularity and Object Localization
Weakly supervised object localization (WSOL) aims to learn representations that encode object location using only image-level category labels. However, many objects can be labeled at different levels of granul...
-
Chapter and Conference Paper
A Metric Learning Reality Check
Deep metric learning papers from the past four years have consistently claimed great advances in accuracy, often more than doubling the performance of decade-old methods. In this paper, we take a closer look a...
-
Chapter and Conference Paper
Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset
In this work we explore the task of instance segmentation with attribute localization, which unifies instance segmentation (detect and segment each object instance) and fine-grained visual attribute categorizatio...
-
Chapter and Conference Paper
Learning Gradient Fields for Shape Generation
In this work, we propose a novel technique to generate shapes from point cloud data. A point cloud can be viewed as samples from a distribution of 3D points whose density is concentrated near the surface of th...
-
Chapter and Conference Paper
Deep Fundamental Matrix Estimation Without Correspondences
Estimating fundamental matrices is a classic problem in computer vision. Traditional methods rely heavily on the correctness of estimated key-point correspondences, which can be noisy and unreliable. As a resu...
-
Chapter and Conference Paper
Learning Single-View 3D Reconstruction with Limited Pose Supervision
It is expensive to label images with 3D structure or precise camera pose. Yet, this is precisely the kind of annotation required to train single-view 3D reconstruction models. In contrast, unlabeled images or ...
-
Chapter and Conference Paper
Multimodal Unsupervised Image-to-Image Translation
Unsupervised image-to-image translation is an important and challenging problem in computer vision. Given an image in the source domain, the goal is to learn the conditional distribution of corresponding image...
-
Chapter and Conference Paper
Convolutional Networks with Adaptive Inference Graphs
Do convolutional networks really need a fixed feed-forward structure? What if, after identifying the high-level concept of an image, a network could move directly to a layer that can distinguish fine-grained d...
-
Chapter and Conference Paper
Discriminative Regions: A Substrate for Analyzing Life-Logging Image Sequences
Life-logging devices are becoming ubiquitous, yet still processing and extracting information from the vast amount of data that is being captured is a very challenging task. We propose a method to find discrim...
-
Chapter and Conference Paper
Microsoft COCO: Common Objects in Context
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This ...
-
Chapter and Conference Paper
Camera Distance from Face Images
We present a method for estimating the distance between a camera and a human head in 2D images from a calibrated camera. Leading head pose estimation algorithms focus mainly on head orientation (yaw, pitch, an...
-
Chapter and Conference Paper
Face Box Shape and Verification
Successful face verification and recognition require matching corresponding points in a pair of images, and it is commonly acknowledged that alignment is a critical step prior to matching. Once aligned, a port...
-
Chapter and Conference Paper
JBoost Optimization of Color Detectors for Autonomous Underwater Vehicle Navigation
In the world of autonomous underwater vehicles (AUV) the prominent form of sensing is sonar due to cloudy water conditions and dispersion of light. Although underwater conditions are highly suitable for sonar,...
-
Chapter and Conference Paper
Word Spotting in the Wild
We present a method for spotting words in the wild, i.e., in real images taken in unconstrained environments. Text found in the wild has a surprising range of difficulty. At one end of the spectrum, Optical Chara...
-
Chapter and Conference Paper
Visual Recognition with Humans in the Loop
We present an interactive, hybrid human-computer method for object classification. The method applies to classes of objects that are recognizable by people with appropriate expertise (e.g., animal species or airp...