Skip to main content

and
  1. No Access

    Chapter and Conference Paper

    Dynamic Multi-modal Prompting for Efficient Visual Grounding

    Prompt tuning has emerged as a flexible approach for adapting pre-trained models by solely learning additional inputs while kee** the model parameters frozen. However, simplistic prompts are insufficient to ...

    Wansen Wu, Ting Liu, Youkai Wang, Kai Xu in Pattern Recognition and Computer Vision (2024)

  2. No Access

    Chapter and Conference Paper

    PANDA: Prompt-Based Context- and Indoor-Aware Pretraining for Vision and Language Navigation

    Pretrained visual-language models have extensive world kno- wledge and are widely used in visual and language navigation (VLN). However, they are not sensitive to indoor scenarios for VLN tasks. Another challe...

    Ting Liu, Yue Hu, Wansen Wu, Youkai Wang, Kai Xu, Quanjun Yin in MultiMedia Modeling (2024)

  3. No Access

    Chapter and Conference Paper

    ACT: Action-assoCiated and Target-Related Representations for Object Navigation

    Object navigation tasks require an agent to find a target in an unknown environment based on its observations. Researchers employ various techniques, such as extracting high-level semantic information and buil...

    Youkai Wang, Yue Hu, Wansen Wu, Ting Liu, Yong Peng in MultiMedia Modeling (2024)