![Loading...](https://link.springer.com/static/c4a417b97a76cc2980e3c25e2271af3129e08bbe/images/pdf-preview/spacer.gif)
-
Chapter and Conference Paper
Descriptive Attributes for Language-Based Object Keypoint Detection
Multimodal vision and language (VL) models have recently shown strong performance in phrase grounding and object detection for both zero-shot and finetuned cases. We adapt a VL model (GLIP) for keypoint detect...
-
Chapter and Conference Paper
Text-Driven Stylization of Video Objects
We tackle the task of stylizing video objects in an intuitive and semantic manner following a user-specified text prompt. This is a challenging task as the resulting video must satisfy multiple properties: (1)...
-
Chapter and Conference Paper
SITTA: Single Image Texture Translation for Data Augmentation
Recent advances in data augmentation enable one to translate images by learning the map** between a source domain and a target domain. Existing methods tend to learn the distributions by training a model on ...
-
Chapter and Conference Paper
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
We present a new benchmark dataset, Sapsucker Woods 60 (SSW60), for advancing research on audiovisual fine-grained categorization. While our community has made great strides in fine-grained visual categorizati...
-
Chapter and Conference Paper
Visual Prompt Tuning
The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, i.e., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alte...
-
Chapter and Conference Paper
On Label Granularity and Object Localization
Weakly supervised object localization (WSOL) aims to learn representations that encode object location using only image-level category labels. However, many objects can be labeled at different levels of granul...
-
Chapter and Conference Paper
A Metric Learning Reality Check
Deep metric learning papers from the past four years have consistently claimed great advances in accuracy, often more than doubling the performance of decade-old methods. In this paper, we take a closer look a...
-
Chapter and Conference Paper
Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset
In this work we explore the task of instance segmentation with attribute localization, which unifies instance segmentation (detect and segment each object instance) and fine-grained visual attribute categorizatio...
-
Chapter and Conference Paper
Learning Gradient Fields for Shape Generation
In this work, we propose a novel technique to generate shapes from point cloud data. A point cloud can be viewed as samples from a distribution of 3D points whose density is concentrated near the surface of th...
-
Chapter and Conference Paper
Deep Fundamental Matrix Estimation Without Correspondences
Estimating fundamental matrices is a classic problem in computer vision. Traditional methods rely heavily on the correctness of estimated key-point correspondences, which can be noisy and unreliable. As a resu...
-
Chapter
Crowd Research: Open and Scalable University Laboratories
Research experiences today are limited to a privileged few at select universities. Providing open access to research experiences would enable global upward mobility and increased diversity in the scientific wo...
-
Chapter and Conference Paper
Learning Single-View 3D Reconstruction with Limited Pose Supervision
It is expensive to label images with 3D structure or precise camera pose. Yet, this is precisely the kind of annotation required to train single-view 3D reconstruction models. In contrast, unlabeled images or ...
-
Chapter and Conference Paper
Multimodal Unsupervised Image-to-Image Translation
Unsupervised image-to-image translation is an important and challenging problem in computer vision. Given an image in the source domain, the goal is to learn the conditional distribution of corresponding image...
-
Chapter and Conference Paper
Convolutional Networks with Adaptive Inference Graphs
Do convolutional networks really need a fixed feed-forward structure? What if, after identifying the high-level concept of an image, a network could move directly to a layer that can distinguish fine-grained d...
-
Chapter
Cross-View Image Geo-localization
The recent availability of large amounts of geo-tagged imagery has inspired a number of data-driven solutions to the image geo-localization problem. Existing approaches predict the location of a query image by...
-
Chapter and Conference Paper
Discriminative Regions: A Substrate for Analyzing Life-Logging Image Sequences
Life-logging devices are becoming ubiquitous, yet still processing and extracting information from the vast amount of data that is being captured is a very challenging task. We propose a method to find discrim...
-
Chapter and Conference Paper
Microsoft COCO: Common Objects in Context
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This ...
-
Chapter and Conference Paper
Camera Distance from Face Images
We present a method for estimating the distance between a camera and a human head in 2D images from a calibrated camera. Leading head pose estimation algorithms focus mainly on head orientation (yaw, pitch, an...
-
Chapter and Conference Paper
Face Box Shape and Verification
Successful face verification and recognition require matching corresponding points in a pair of images, and it is commonly acknowledged that alignment is a critical step prior to matching. Once aligned, a port...
-
Chapter and Conference Paper
JBoost Optimization of Color Detectors for Autonomous Underwater Vehicle Navigation
In the world of autonomous underwater vehicles (AUV) the prominent form of sensing is sonar due to cloudy water conditions and dispersion of light. Although underwater conditions are highly suitable for sonar,...