-
Article
Visually-Guided Audio Spatialization in Video with Geometry-Aware Multi-task Learning
Binaural audio provides human listeners with an immersive spatial sound experience, but most existing videos lack binaural audio recordings. We propose an audio spatialization method that draws on visual infor...
-
Chapter and Conference Paper
Egocentric Activity Recognition and Localization on a 3D Map
Given a video captured from a first person perspective and the environment context of where the video is recorded, can we recognize what the person is doing and identify where the action occurs in the 3D space...
-
Chapter and Conference Paper
Active Audio-Visual Separation of Dynamic Sound Sources
We explore active audio-visual separation for dynamic sound sources, where an embodied agent moves intelligently in a 3D environment to continuously isolate the time-varying audio stream being emitted by an objec...
-
Article
An Exploration of Embodied Visual Exploration
Embodied computer vision considers perception for robots in novel, unstructured environments. Of particular importance is the embodied visual exploration problem: how might a robot equipped with a camera scope...
-
Article
Densifying Supervision for Fine-Grained Visual Comparisons
Detecting subtle differences in visual attributes requires inferring which of two images exhibits a property more, e.g., which face is smiling slightly more, or which shoe is slightly more sporty. While valuab...
-
Chapter and Conference Paper
SoundSpaces: Audio-Visual Navigation in 3D Environments
Moving around in the world is naturally a multisensory experience, but today’s embodied agents are deaf—restricted to solely their visual perception of the environment. We introduce audio-visual navigation for...
-
Chapter and Conference Paper
VisualEchoes: Spatial Image Representation Learning Through Echolocation
Several animal species (e.g., bats, dolphins, and whales) and even visually impaired humans have the remarkable ability to perform echolocation: a biological sonar used to perceive spatial layout and locate ob...
-
Chapter and Conference Paper
Occupancy Anticipation for Efficient Exploration and Navigation
State-of-the-art navigation methods leverage a spatial memory to generalize to new environments, but their occupancy maps are limited to capturing the geometric structures directly observed by the agent. We pr...
-
Chapter and Conference Paper
Proposal-Based Video Completion
Video inpainting is an important technique for a wide variety of applications from video content editing to video restoration. Early approaches follow image inpainting paradigms, but are challenged by complex ...
-
Article
Click Carving: Interactive Object Segmentation in Images and Videos with Point Clicks
We present a novel form of interactive object segmentation called Click Carving which enables accurate segmentation of objects in images and videos with only a few point clicks. Whereas conventional interactive p...
-
Article
Predicting How to Distribute Work Between Algorithms and Humans to Segment an Image Batch
Foreground object segmentation is a critical step for many image analysis tasks. While automated methods can produce high-quality results, their failures disappoint users in need of practical solutions. We pro...
-
Article
Predicting Foreground Object Ambiguity and Efficiently Crowdsourcing the Segmentation(s)
We propose the ambiguity problem for the foreground object segmentation task and motivate the importance of estimating and accounting for this ambiguity when designing vision systems. Specifically, we distingu...
-
Article
Subjects and Their Objects: Localizing Interactees for a Person-Centric View of Importance
Understanding images with people often entails understanding their interactions with other objects or people. As such, given a novel image, a vision system ought to infer which other objects/people play an import...
-
Chapter and Conference Paper
Retrospective Encoders for Video Summarization
Supervised learning techniques have shown substantial progress on video summarization. State-of-the-art approaches mostly regard the predicted summary and the human summary as two sequences (sets), and minimiz...
-
Chapter and Conference Paper
Sidekick Policy Learning for Active Visual Exploration
We consider an active visual exploration scenario, where an agent must intelligently select its camera motions to efficiently reconstruct the full environment from only a limited set of narrow field-of-view glimp...
-
Chapter and Conference Paper
ShapeCodes: Self-supervised Feature Learning by Lifting Views to Viewgrids
We introduce an unsupervised feature learning approach that embeds 3D shape information into a single-view image representation. The main idea is a self-supervised training objective that, given only a single ...
-
Chapter and Conference Paper
Attributes as Operators: Factorizing Unseen Attribute-Object Compositions
We present a new approach to modeling visual attributes. Prior work casts attributes in a similar role as objects, learning a latent representation where properties (e.g., sliced) are recognized by classifiers mu...
-
Chapter and Conference Paper
Snap Angle Prediction for 360 \(^{\circ }\) Panoramas
360 \(^{\circ }\) ...
-
Chapter and Conference Paper
Learning to Separate Object Sounds by Watching Unlabeled Video
Perceiving a scene most fully requires all the senses. Yet modeling how objects look and sound is challenging: most natural scenes and events contain multiple objects, and the audio track mixes all the sound s...
-
Article
Learning Image Representations Tied to Egomotion from Unlabeled Video
Understanding how images of objects and scenes behave in response to specific egomotions is a crucial aspect of proper visual development, yet existing visual learning methods are conspicuously disconnected fr...