![Loading...](https://link.springer.com/static/c4a417b97a76cc2980e3c25e2271af3129e08bbe/images/pdf-preview/spacer.gif)
-
Article
Vision-language navigation: a survey and taxonomy
Vision-language navigation (VLN) tasks require an agent to follow language instructions from a human guide to navigate in previously unseen environments using visual observations. This challenging field, invol...
-
Chapter and Conference Paper
Dynamic Multi-modal Prompting for Efficient Visual Grounding
Prompt tuning has emerged as a flexible approach for adapting pre-trained models by solely learning additional inputs while kee** the model parameters frozen. However, simplistic prompts are insufficient to ...
-
Chapter and Conference Paper
PANDA: Prompt-Based Context- and Indoor-Aware Pretraining for Vision and Language Navigation
Pretrained visual-language models have extensive world kno- wledge and are widely used in visual and language navigation (VLN). However, they are not sensitive to indoor scenarios for VLN tasks. Another challe...
-
Chapter and Conference Paper
ACT: Action-assoCiated and Target-Related Representations for Object Navigation
Object navigation tasks require an agent to find a target in an unknown environment based on its observations. Researchers employ various techniques, such as extracting high-level semantic information and buil...