-
Chapter and Conference Paper
Dynamic Multi-modal Prompting for Efficient Visual Grounding
Prompt tuning has emerged as a flexible approach for adapting pre-trained models by solely learning additional inputs while kee** the model parameters frozen. However, simplistic prompts are insufficient to ...
-
Chapter and Conference Paper
PANDA: Prompt-Based Context- and Indoor-Aware Pretraining for Vision and Language Navigation
Pretrained visual-language models have extensive world kno- wledge and are widely used in visual and language navigation (VLN). However, they are not sensitive to indoor scenarios for VLN tasks. Another challe...
-
Chapter and Conference Paper
ACT: Action-assoCiated and Target-Related Representations for Object Navigation
Object navigation tasks require an agent to find a target in an unknown environment based on its observations. Researchers employ various techniques, such as extracting high-level semantic information and buil...