We are improving our search experience. To check which content you have full access to, or for advanced search, go back to the old search.

Search

Please fill in this field.
Filters applied:

Search Results

Showing 1-20 of 3,621 results
  1. GFPE-ViT: vision transformer with geometric-fractal-based position encoding

    In recent years, transformers have become a significant tool in computer vision, revolutionizing fundamental tasks. This paper focuses on the map**...

    Lei Wang, Xue-song Tang, Kuangrong Hao in The Visual Computer
    Article 17 April 2024
  2. Spatiotemporal Representation Enhanced ViT for Video Recognition

    Vision Transformers (ViTs) are promising for solving video-related tasks, but often suffer from computational bottlenecks or insufficient temporal...
    Min Li, Fengfa Li, ... Chenghua Gao in MultiMedia Modeling
    Conference paper 2024
  3. ViT-PGC: vision transformer for pedestrian gender classification on small-size dataset

    Pedestrian gender classification (PGC) is a key task in full-body-based pedestrian image analysis and has become an important area in applications...

    Farhat Abbas, Mussarat Yasmin, ... Usman Asim in Pattern Analysis and Applications
    Article 26 September 2023
  4. Add-Vit: CNN-Transformer Hybrid Architecture for Small Data Paradigm Processing

    The vision transformer(ViT), pre-trained on large datasets, outperforms convolutional neural networks (CNN) in computer vision(CV). However, if not...

    **hui Chen, Peng Wu, ... Jia Liang in Neural Processing Letters
    Article Open access 07 June 2024
  5. Conv-ViT fusion for improved handwritten Arabic character classification

    An essential aspect of pattern recognition pertains to handwriting recognition, particularly in languages with diverse character styles like Arabic....

    Sarra Rouabhi, Abdennour Azerine, ... Lhassane Idoumghar in Signal, Image and Video Processing
    Article 29 April 2024
  6. Occluded pedestrian re-identification via Res-ViT double-branch hybrid network

    Existing occluded pedestrian re-identification methods mainly utilize convolutional neural networks to realize the feature matching under different...

    Yunbin Zhao, Songhao Zhu in Multimedia Systems
    Article 12 January 2024
  7. Hybrid CNN-ViT architecture to exploit spatio-temporal feature for fire recognition trained through transfer learning

    Fires are becoming one of the major natural hazards that threaten the ecology, economy, human life and even more worldwide. Therefore, early fire...

    Mohammad Shahid, Hong-Cyuan Wang, ... Kai-Lung Hua in Multimedia Tools and Applications
    Article 25 March 2024
  8. ViT-DAE: Transformer-Driven Diffusion Autoencoder for Histopathology Image Analysis

    Generative AI has received substantial attention in recent years due to its ability to synthesize data that closely resembles the original data...
    Xuan Xu, Saarthak Kapse, ... Prateek Prasanna in Deep Generative Models
    Conference paper 2024
  9. ViT-MPI: Vision Transformer Multiplane Images for Surgical Single-View View Synthesis

    In this paper, we explore the use of a single imaging device to acquire immersive 3D perception in endoscopic surgery. To solve the heavily ill-posed...
    Chenming Han, Ruizhi Shao, ... Yebin Liu in Artificial Intelligence
    Conference paper 2024
  10. Vision transformers (ViT) and deep convolutional neural network (D-CNN)-based models for MRI brain primary tumors images multi-classification supported by explainable artificial intelligence (XAI)

    The manual classification of primary brain tumors through Magnetic Resonance Imaging (MRI) is considered as a critical task during the clinical...

    Hiba Mzoughi, Ines Njeh, ... Chokri Mhiri in The Visual Computer
    Article 26 June 2024
  11. Enhancing Cell Detection in Histopathology Images: A ViT-Based U-Net Approach

    Cell detection in histology images is a pivotal and fundamental task within the field of computational pathology. Recent advancements have led to the...
    Conference paper 2024
  12. YOLO-based CAD framework with ViT transformer for breast mass detection and classification in CESM and FFDM images

    Breast cancer detection is considered a challenging task for the average experienced radiologist due to the variation of the lesions’ size and shape,...

    Nada M. Hassan, Safwat Hamad, Khaled Mahar in Neural Computing and Applications
    Article Open access 16 January 2024
  13. Multimodal Learning for Road Safety Using Vision Transformer ViT

    This paper proposes a novel approach for multimodal learning that combines visual information from images with structured data from a multi-column...
    Asmae Rhanizar, Zineb El Akkaoui in New Technologies, Artificial Intelligence and Smart Data
    Conference paper 2024
  14. ViT-Siamese Cascade Network for Transmission Image Deduplication

    With the large-scale use of various inspection methods such as drones, helicopters, and robots, the generated power inspection images have increased...
    Zhenyu Chen, Siyu Chen, ... **aoyu Zhang in Digital Multimedia Communications
    Conference paper 2023
  15. Improved Image Captioning Using GAN and ViT

    Encoder-decoder architectures are widely used in solving image captioning applications. Convolutional encoders and recurrent decoders are prominently...
    Vrushank D. Rao, B. N. Shashank, S. Nagesh Bhattu in Computer Vision and Image Processing
    Conference paper 2024
  16. On the Effectiveness of ViT Features as Local Semantic Descriptors

    We study the use of deep features extracted from a pre-trained Vision Transformer (ViT) as dense visual descriptors. We observe and empirically...
    Shir Amir, Yossi Gandelsman, ... Tali Dekel in Computer Vision – ECCV 2022 Workshops
    Conference paper 2023
  17. Latent Diffusion Model-Based T2T-ViT for SAR Ship Classification

    Recently, deep learning methods have been applied to ship classification in Synthetic Aperture Radar (SAR) images. However, because of the problem of...
    Yuhang Qi, Lu Wang, ... Chunhui Zhao in Computer Supported Cooperative Work and Social Computing
    Conference paper 2024
  18. FGPTQ-ViT: Fine-Grained Post-training Quantization for Vision Transformers

    The complex architecture and high training cost of Vision Transformers (ViTs) have prompted the exploration of post-training quantization (PTQ)....
    Caihua Liu, Hongyang Shi, **nyu He in Pattern Recognition and Computer Vision
    Conference paper 2024
  19. VFIQ: A Novel Model of ViT-FSIMc Hybrid Siamese Network for Image Quality Assessment

    The Image Quality Assessment (IQA) is to measure how humans perceive the quality of images. In this paper, we propose a new model named for VFIQ – a...
    Junrong Huang, Chenwei Wang in Neural Information Processing
    Conference paper 2024
  20. TON-ViT: A Neuro-Symbolic AI Based on Task Oriented Network with a Vision Transformer

    The objective of this paper is to present a neuro-symbolic AI based technique to represent field-medicine knowledge, referred as to TON-ViT. TON-ViT...
    Yupeng Zhuo, Nina Jiang, ... Juan Wachs in Medical Image Understanding and Analysis
    Conference paper 2024
Did you find what you were looking for? Share feedback.