![Loading...](https://link.springer.com/static/c4a417b97a76cc2980e3c25e2271af3129e08bbe/images/pdf-preview/spacer.gif)
-
Chapter and Conference Paper
Descriptive Attributes for Language-Based Object Keypoint Detection
Multimodal vision and language (VL) models have recently shown strong performance in phrase grounding and object detection for both zero-shot and finetuned cases. We adapt a VL model (GLIP) for keypoint detect...
-
Chapter and Conference Paper
Text-Driven Stylization of Video Objects
We tackle the task of stylizing video objects in an intuitive and semantic manner following a user-specified text prompt. This is a challenging task as the resulting video must satisfy multiple properties: (1)...
-
Article
Open AccessOccluded Video Instance Segmentation: A Benchmark
Can our video understanding systems perceive objects when a heavy occlusion exists in a scene? To answer this question, we collect a large-scale dataset called OVIS for occluded video instance segmentation, th...
-
Article
Convolutional Networks with Adaptive Inference Graphs
Do convolutional networks really need a fixed feed-forward structure? What if, after identifying the high-level concept of an image, a network could move directly to a layer that can distinguish fine-grained d...
-
Chapter and Conference Paper
Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset
In this work we explore the task of instance segmentation with attribute localization, which unifies instance segmentation (detect and segment each object instance) and fine-grained visual attribute categorizatio...
-
Chapter and Conference Paper
Learning Gradient Fields for Shape Generation
In this work, we propose a novel technique to generate shapes from point cloud data. A point cloud can be viewed as samples from a distribution of 3D points whose density is concentrated near the surface of th...
-
Chapter and Conference Paper
Deep Fundamental Matrix Estimation Without Correspondences
Estimating fundamental matrices is a classic problem in computer vision. Traditional methods rely heavily on the correctness of estimated key-point correspondences, which can be noisy and unreliable. As a resu...
-
Article
Vision-based real estate price estimation
Since the advent of online real estate database companies like Zillow, Trulia and Redfin, the problem of automatic estimation of market values for houses has received considerable attention. Several real estat...
-
Chapter and Conference Paper
Learning Single-View 3D Reconstruction with Limited Pose Supervision
It is expensive to label images with 3D structure or precise camera pose. Yet, this is precisely the kind of annotation required to train single-view 3D reconstruction models. In contrast, unlabeled images or ...
-
Chapter and Conference Paper
Convolutional Networks with Adaptive Inference Graphs
Do convolutional networks really need a fixed feed-forward structure? What if, after identifying the high-level concept of an image, a network could move directly to a layer that can distinguish fine-grained d...
-
Chapter
Cross-View Image Geo-localization
The recent availability of large amounts of geo-tagged imagery has inspired a number of data-driven solutions to the image geo-localization problem. Existing approaches predict the location of a query image by...
-
Chapter and Conference Paper
Discriminative Regions: A Substrate for Analyzing Life-Logging Image Sequences
Life-logging devices are becoming ubiquitous, yet still processing and extracting information from the vast amount of data that is being captured is a very challenging task. We propose a method to find discrim...
-
Article
Editorial: Special Issue on Active and Interactive Methods in Computer Vision
-
Article
The Ignorant Led by the Blind: A Hybrid Human–Machine Vision System for Fine-Grained Categorization
We present a visual recognition system for fine-grained visual categorization. The system is composed of a human and a machine working together and combines the complementary strengths of computer vision algor...
-
Chapter and Conference Paper
Microsoft COCO: Common Objects in Context
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This ...
-
Chapter and Conference Paper
Camera Distance from Face Images
We present a method for estimating the distance between a camera and a human head in 2D images from a calibrated camera. Leading head pose estimation algorithms focus mainly on head orientation (yaw, pitch, an...
-
Chapter and Conference Paper
Face Box Shape and Verification
Successful face verification and recognition require matching corresponding points in a pair of images, and it is commonly acknowledged that alignment is a critical step prior to matching. Once aligned, a port...
-
Chapter and Conference Paper
JBoost Optimization of Color Detectors for Autonomous Underwater Vehicle Navigation
In the world of autonomous underwater vehicles (AUV) the prominent form of sensing is sonar due to cloudy water conditions and dispersion of light. Although underwater conditions are highly suitable for sonar,...
-
Article
Open AccessGlobally Optimal Algorithms for Stratified Autocalibration
We present practical algorithms for stratified autocalibration with theoretical guarantees of global optimality. Given a projective reconstruction, we first upgrade it to affine by estimating the position of t...
-
Chapter and Conference Paper
Word Spotting in the Wild
We present a method for spotting words in the wild, i.e., in real images taken in unconstrained environments. Text found in the wild has a surprising range of difficulty. At one end of the spectrum, Optical Chara...