-
Article
FLAVR: flow-free architecture for fast video frame interpolation
Many modern frame interpolation approaches rely on explicit bidirectional optical flows between adjacent frames, thus are sensitive to the accuracy of underlying flow estimation in handling occlusions while ad...
-
Chapter and Conference Paper
MemSAC: Memory Augmented Sample Consistency for Large Scale Domain Adaptation
Practical real world datasets with plentiful categories introduce new challenges for unsupervised domain adaptation like small inter-class discriminability, that existing approaches relying on domain invarianc...
-
Chapter and Conference Paper
Learning Semantic Segmentation from Multiple Datasets with Label Shifts
While it is desirable to train segmentation models on an aggregation of multiple datasets, a major challenge is that the label space of each dataset may be in conflict with one another. To tackle this challeng...
-
Chapter and Conference Paper
TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments
High-quality structured data with rich annotations are critical components in intelligent vehicle systems dealing with road scenes. However, data curation and annotation require intensive investments and yield...
-
Chapter and Conference Paper
A Level Set Theory for Neural Implicit Evolution Under Explicit Flows
Coordinate-based neural networks parameterizing implicit surfaces have emerged as efficient representations of geometry. They effectively act as parametric level sets with the zero-level set defining the surfa...
-
Chapter and Conference Paper
Learning Phase Mask for Privacy-Preserving Passive Depth Estimation
With over a billion sold each year, cameras are not only becoming ubiquitous, but are driving progress in a wide range of domains such as mixed reality, robotics, and more. However, severe concerns regarding t...
-
Chapter and Conference Paper
Exploiting Unlabeled Data with Vision and Language Models for Object Detection
Building robust and generic object detection frameworks requires scaling to larger label spaces and bigger training datasets. However, it is prohibitively costly to acquire annotations for thousands of categor...
-
Chapter and Conference Paper
Single-Stream Multi-level Alignment for Vision-Language Pretraining
Self-supervised vision-language pretraining from pure images and text with a contrastive loss is effective, but ignores fine-grained alignment due to a dual-stream architecture that aligns image and text repre...
-
Chapter and Conference Paper
Physically-Based Editing of Indoor Scene Lighting from a Single Image
We present a method to edit complex indoor lighting from a single image with its predicted depth and light source segmentation masks. This is an extremely challenging problem that requires modeling complex lig...
-
Reference Work Entry In depth
Bas-Relief Ambiguity
-
Chapter and Conference Paper
Semantic Segmentation Datasets for Resource Constrained Training
Several large scale datasets, coupled with advances in deep neural network architectures have been greatly successful in pushing the boundaries of performance in semantic segmentation in recent years. However,...
-
Chapter and Conference Paper
Improving Face Recognition by Clustering Unlabeled Faces in the Wild
While deep face recognition has benefited significantly from large-scale labeled data, current research is focused on leveraging unlabeled data to further boost performance, reducing the cost of human annotati...
-
Chapter and Conference Paper
Single View Metrology in the Wild
Most 3D reconstruction methods may only recover scene properties up to a global scale ambiguity. We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by ...
-
Chapter and Conference Paper
Single-Shot Neural Relighting and SVBRDF Estimation
We present a novel physically-motivated deep network for joint shape and material estimation, as well as relighting under novel illumination conditions, using a single image captured by a mobile phone camera. ...
-
Chapter and Conference Paper
Learning Monocular Visual Odometry via Self-Supervised Long-Term Modeling
Monocular visual odometry (VO) suffers severely from error accumulation during frame-to-frame pose estimation. In this paper, we present a self-supervised learning method for VO with special consideration for ...
-
Chapter and Conference Paper
Object Detection with a Unified Label Space from Multiple Datasets
Given multiple datasets with different label spaces, the goal of this work is to train a single object detector predicting over the union of all the label spaces. The practical benefits of such an object dete...
-
Chapter and Conference Paper
Pseudo RGB-D for Self-improving Monocular SLAM and Depth Prediction
Classical monocular Simultaneous Localization And Map** (SLAM) and the recently emerging convolutional neural networks (CNNs) for monocular depth prediction represent two largely disjoint approaches towards ...
-
Chapter and Conference Paper
Domain Adaptive Semantic Segmentation Using Weak Labels
Learning semantic segmentation models requires a huge amount of pixel-wise labeling. However, labeled data may only be available abundantly in a domain different from the desired target domain, which only has ...
-
Chapter and Conference Paper
SMART: Simultaneous Multi-Agent Recurrent Trajectory Prediction
We propose advances that address two key challenges in future trajectory prediction: (i) multimodality in both training data and predictions and (ii) constant time inference regardless of number of agents. Exi...
-
Chapter and Conference Paper
Learning to Look around Objects for Top-View Representations of Outdoor Scenes
Given a single RGB image of a complex outdoor road scene in the perspective view, we address the novel problem of estimating an occlusion-reasoned semantic scene layout in the top-view. This challenging proble...