-
Chapter and Conference Paper
Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer
State-of-the-art transformer-based video instance segmentation (VIS) approaches typically utilize either single-scale spatio-temporal features or per-frame multi-scale features during the attention computation...
-
Chapter and Conference Paper
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications
In the pursuit of achieving ever-increasing accuracy, large and complex neural networks are usually developed. Such models demand high computational resources and therefore cannot be deployed on edge devices. ...
-
Chapter and Conference Paper
PS-ARM: An End-to-End Attention-Aware Relation Mixer Network for Person Search
Person search is a challenging problem with various real-world applications, that aims at joint person detection and re-identification of a query person from uncropped gallery images. Although, previous study ...