Search Results - Springer

Sort By Newest First Oldest First

Article

Visually-Guided Audio Spatialization in Video with Geometry-Aware Multi-task Learning

Binaural audio provides human listeners with an immersive spatial sound experience, but most existing videos lack binaural audio recordings. We propose an audio spatialization method that draws on visual infor...

Rishabh Garg, Ruohan Gao, Kristen Grauman in International Journal of Computer Vision (2023)
Chapter and Conference Paper

Egocentric Activity Recognition and Localization on a 3D Map

Given a video captured from a first person perspective and the environment context of where the video is recorded, can we recognize what the person is doing and identify where the action occurs in the 3D space...

Miao Liu, Lingni Ma, Kiran Somasundaram, Yin Li… in Computer Vision – ECCV 2022 (2022)
Chapter and Conference Paper

Active Audio-Visual Separation of Dynamic Sound Sources

We explore active audio-visual separation for dynamic sound sources, where an embodied agent moves intelligently in a 3D environment to continuously isolate the time-varying audio stream being emitted by an objec...

Sagnik Majumder, Kristen Grauman in Computer Vision – ECCV 2022 (2022)
Article

An Exploration of Embodied Visual Exploration

Embodied computer vision considers perception for robots in novel, unstructured environments. Of particular importance is the embodied visual exploration problem: how might a robot equipped with a camera scope...

Santhosh K. Ramakrishnan, Dinesh Jayaraman… in International Journal of Computer Vision (2021)
Article

Densifying Supervision for Fine-Grained Visual Comparisons

Detecting subtle differences in visual attributes requires inferring which of two images exhibits a property more, e.g., which face is smiling slightly more, or which shoe is slightly more sporty. While valuab...

Aron Yu, Kristen Grauman in International Journal of Computer Vision (2020)
Chapter and Conference Paper

SoundSpaces: Audio-Visual Navigation in 3D Environments

Moving around in the world is naturally a multisensory experience, but today’s embodied agents are deaf—restricted to solely their visual perception of the environment. We introduce audio-visual navigation for...

Changan Chen, Unnat Jain, Carl Schissler… in Computer Vision – ECCV 2020 (2020)
Chapter and Conference Paper

VisualEchoes: Spatial Image Representation Learning Through Echolocation

Several animal species (e.g., bats, dolphins, and whales) and even visually impaired humans have the remarkable ability to perform echolocation: a biological sonar used to perceive spatial layout and locate ob...

Ruohan Gao, Changan Chen, Ziad Al-Halah, Carl Schissler… in Computer Vision – ECCV 2020 (2020)
Chapter and Conference Paper

Occupancy Anticipation for Efficient Exploration and Navigation

State-of-the-art navigation methods leverage a spatial memory to generalize to new environments, but their occupancy maps are limited to capturing the geometric structures directly observed by the agent. We pr...

Santhosh K. Ramakrishnan, Ziad Al-Halah, Kristen Grauman in Computer Vision – ECCV 2020 (2020)
Chapter and Conference Paper

Proposal-Based Video Completion

Video inpainting is an important technique for a wide variety of applications from video content editing to video restoration. Early approaches follow image inpainting paradigms, but are challenged by complex ...

Yuan-Ting Hu, Heng Wang, Nicolas Ballas, Kristen Grauman… in Computer Vision – ECCV 2020 (2020)
Article

Click Carving: Interactive Object Segmentation in Images and Videos with Point Clicks

We present a novel form of interactive object segmentation called Click Carving which enables accurate segmentation of objects in images and videos with only a few point clicks. Whereas conventional interactive p...

Suyog Dutt Jain, Kristen Grauman in International Journal of Computer Vision (2019)
Article

Predicting How to Distribute Work Between Algorithms and Humans to Segment an Image Batch

Foreground object segmentation is a critical step for many image analysis tasks. While automated methods can produce high-quality results, their failures disappoint users in need of practical solutions. We pro...

Danna Gurari, Yinan Zhao, Suyog Dutt Jain… in International Journal of Computer Vision (2019)
Article

Predicting Foreground Object Ambiguity and Efficiently Crowdsourcing the Segmentation(s)

We propose the ambiguity problem for the foreground object segmentation task and motivate the importance of estimating and accounting for this ambiguity when designing vision systems. Specifically, we distingu...

Danna Gurari, Kun He, Bo **ong, Jianming Zhang… in International Journal of Computer Vision (2018)
Article

Subjects and Their Objects: Localizing Interactees for a Person-Centric View of Importance

Understanding images with people often entails understanding their interactions with other objects or people. As such, given a novel image, a vision system ought to infer which other objects/people play an import...

Chao-Yeh Chen, Kristen Grauman in International Journal of Computer Vision (2018)
Chapter and Conference Paper

Retrospective Encoders for Video Summarization

Supervised learning techniques have shown substantial progress on video summarization. State-of-the-art approaches mostly regard the predicted summary and the human summary as two sequences (sets), and minimiz...

Ke Zhang, Kristen Grauman, Fei Sha in Computer Vision – ECCV 2018 (2018)

Download PDF (1868 KB) View Chapter
Chapter and Conference Paper

Sidekick Policy Learning for Active Visual Exploration

We consider an active visual exploration scenario, where an agent must intelligently select its camera motions to efficiently reconstruct the full environment from only a limited set of narrow field-of-view glimp...

Santhosh K. Ramakrishnan, Kristen Grauman in Computer Vision – ECCV 2018 (2018)

Download PDF (2759 KB) View Chapter
Chapter and Conference Paper

ShapeCodes: Self-supervised Feature Learning by Lifting Views to Viewgrids

We introduce an unsupervised feature learning approach that embeds 3D shape information into a single-view image representation. The main idea is a self-supervised training objective that, given only a single ...

Dinesh Jayaraman, Ruohan Gao, Kristen Grauman in Computer Vision – ECCV 2018 (2018)

Download PDF (1975 KB) View Chapter
Chapter and Conference Paper

Attributes as Operators: Factorizing Unseen Attribute-Object Compositions

We present a new approach to modeling visual attributes. Prior work casts attributes in a similar role as objects, learning a latent representation where properties (e.g., sliced) are recognized by classifiers mu...

Tushar Nagarajan, Kristen Grauman in Computer Vision – ECCV 2018 (2018)

Download PDF (3271 KB) View Chapter
Chapter and Conference Paper

Snap Angle Prediction for 360 \(^{\circ }\) Panoramas

360 \(^{\circ }\) ...

Bo **ong, Kristen Grauman in Computer Vision – ECCV 2018 (2018)

Download PDF (4359 KB) View Chapter
Chapter and Conference Paper

Learning to Separate Object Sounds by Watching Unlabeled Video

Perceiving a scene most fully requires all the senses. Yet modeling how objects look and sound is challenging: most natural scenes and events contain multiple objects, and the audio track mixes all the sound s...

Ruohan Gao, Rogerio Feris, Kristen Grauman in Computer Vision – ECCV 2018 (2018)

Download PDF (2233 KB) View Chapter
Article

Learning Image Representations Tied to Egomotion from Unlabeled Video

Understanding how images of objects and scenes behave in response to specific egomotions is a crucial aspect of proper visual development, yet existing visual learning methods are conspicuously disconnected fr...

Dinesh Jayaraman, Kristen Grauman in International Journal of Computer Vision (2017)

64 Result(s)

Visually-Guided Audio Spatialization in Video with Geometry-Aware Multi-task Learning

Egocentric Activity Recognition and Localization on a 3D Map

Active Audio-Visual Separation of Dynamic Sound Sources

An Exploration of Embodied Visual Exploration

Densifying Supervision for Fine-Grained Visual Comparisons

SoundSpaces: Audio-Visual Navigation in 3D Environments

VisualEchoes: Spatial Image Representation Learning Through Echolocation

Occupancy Anticipation for Efficient Exploration and Navigation

Proposal-Based Video Completion

Click Carving: Interactive Object Segmentation in Images and Videos with Point Clicks

Predicting How to Distribute Work Between Algorithms and Humans to Segment an Image Batch

Predicting Foreground Object Ambiguity and Efficiently Crowdsourcing the Segmentation(s)

Subjects and Their Objects: Localizing Interactees for a Person-Centric View of Importance

Retrospective Encoders for Video Summarization

Sidekick Policy Learning for Active Visual Exploration

ShapeCodes: Self-supervised Feature Learning by Lifting Views to Viewgrids

Attributes as Operators: Factorizing Unseen Attribute-Object Compositions

Snap Angle Prediction for 360 \(^{\circ }\) Panoramas

Learning to Separate Object Sounds by Watching Unlabeled Video

Learning Image Representations Tied to Egomotion from Unlabeled Video

Our Content

Other Sites

Help & Contacts