Search Results - Springer

Article

Vision-language navigation: a survey and taxonomy

Vision-language navigation (VLN) tasks require an agent to follow language instructions from a human guide to navigate in previously unseen environments using visual observations. This challenging field, invol...

Wansen Wu, Tao Chang, **nmeng Li, Quanjun Yin, Yue Hu in Neural Computing and Applications (2024)

Chapter and Conference Paper

Dynamic Multi-modal Prompting for Efficient Visual Grounding

Prompt tuning has emerged as a flexible approach for adapting pre-trained models by solely learning additional inputs while kee** the model parameters frozen. However, simplistic prompts are insufficient to ...

Wansen Wu, Ting Liu, Youkai Wang, Kai Xu… in Pattern Recognition and Computer Vision (2024)

Chapter and Conference Paper

PANDA: Prompt-Based Context- and Indoor-Aware Pretraining for Vision and Language Navigation

Pretrained visual-language models have extensive world kno- wledge and are widely used in visual and language navigation (VLN). However, they are not sensitive to indoor scenarios for VLN tasks. Another challe...

Ting Liu, Yue Hu, Wansen Wu, Youkai Wang, Kai Xu, Quanjun Yin in MultiMedia Modeling (2024)

Chapter and Conference Paper

ACT: Action-assoCiated and Target-Related Representations for Object Navigation

Object navigation tasks require an agent to find a target in an unknown environment based on its observations. Researchers employ various techniques, such as extracting high-level semantic information and buil...

Youkai Wang, Yue Hu, Wansen Wu, Ting Liu, Yong Peng in MultiMedia Modeling (2024)

4 Result(s)

Vision-language navigation: a survey and taxonomy

Dynamic Multi-modal Prompting for Efficient Visual Grounding

PANDA: Prompt-Based Context- and Indoor-Aware Pretraining for Vision and Language Navigation

ACT: Action-assoCiated and Target-Related Representations for Object Navigation

Our Content

Other Sites

Help & Contacts