Page
%P
-
Chapter and Conference Paper
Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer
State-of-the-art transformer-based video instance segmentation (VIS) approaches typically utilize either single-scale spatio-temporal features or per-frame multi-scale features during the attention computation...
-
Chapter and Conference Paper
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications
In the pursuit of achieving ever-increasing accuracy, large and complex neural networks are usually developed. Such models demand high computational resources and therefore cannot be deployed on edge devices. ...