Search
Search Results
-
GFPE-ViT: vision transformer with geometric-fractal-based position encoding
In recent years, transformers have become a significant tool in computer vision, revolutionizing fundamental tasks. This paper focuses on the map**...
-
Spatiotemporal Representation Enhanced ViT for Video Recognition
Vision Transformers (ViTs) are promising for solving video-related tasks, but often suffer from computational bottlenecks or insufficient temporal... -
ViT-PGC: vision transformer for pedestrian gender classification on small-size dataset
Pedestrian gender classification (PGC) is a key task in full-body-based pedestrian image analysis and has become an important area in applications...
-
Add-Vit: CNN-Transformer Hybrid Architecture for Small Data Paradigm Processing
The vision transformer(ViT), pre-trained on large datasets, outperforms convolutional neural networks (CNN) in computer vision(CV). However, if not...
-
Conv-ViT fusion for improved handwritten Arabic character classification
An essential aspect of pattern recognition pertains to handwriting recognition, particularly in languages with diverse character styles like Arabic....
-
Occluded pedestrian re-identification via Res-ViT double-branch hybrid network
Existing occluded pedestrian re-identification methods mainly utilize convolutional neural networks to realize the feature matching under different...
-
Hybrid CNN-ViT architecture to exploit spatio-temporal feature for fire recognition trained through transfer learning
Fires are becoming one of the major natural hazards that threaten the ecology, economy, human life and even more worldwide. Therefore, early fire...
-
ViT-DAE: Transformer-Driven Diffusion Autoencoder for Histopathology Image Analysis
Generative AI has received substantial attention in recent years due to its ability to synthesize data that closely resembles the original data... -
ViT-MPI: Vision Transformer Multiplane Images for Surgical Single-View View Synthesis
In this paper, we explore the use of a single imaging device to acquire immersive 3D perception in endoscopic surgery. To solve the heavily ill-posed... -
Vision transformers (ViT) and deep convolutional neural network (D-CNN)-based models for MRI brain primary tumors images multi-classification supported by explainable artificial intelligence (XAI)
The manual classification of primary brain tumors through Magnetic Resonance Imaging (MRI) is considered as a critical task during the clinical...
-
Enhancing Cell Detection in Histopathology Images: A ViT-Based U-Net Approach
Cell detection in histology images is a pivotal and fundamental task within the field of computational pathology. Recent advancements have led to the... -
YOLO-based CAD framework with ViT transformer for breast mass detection and classification in CESM and FFDM images
Breast cancer detection is considered a challenging task for the average experienced radiologist due to the variation of the lesions’ size and shape,...
-
Multimodal Learning for Road Safety Using Vision Transformer ViT
This paper proposes a novel approach for multimodal learning that combines visual information from images with structured data from a multi-column... -
ViT-Siamese Cascade Network for Transmission Image Deduplication
With the large-scale use of various inspection methods such as drones, helicopters, and robots, the generated power inspection images have increased... -
Improved Image Captioning Using GAN and ViT
Encoder-decoder architectures are widely used in solving image captioning applications. Convolutional encoders and recurrent decoders are prominently... -
On the Effectiveness of ViT Features as Local Semantic Descriptors
We study the use of deep features extracted from a pre-trained Vision Transformer (ViT) as dense visual descriptors. We observe and empirically... -
Latent Diffusion Model-Based T2T-ViT for SAR Ship Classification
Recently, deep learning methods have been applied to ship classification in Synthetic Aperture Radar (SAR) images. However, because of the problem of... -
FGPTQ-ViT: Fine-Grained Post-training Quantization for Vision Transformers
The complex architecture and high training cost of Vision Transformers (ViTs) have prompted the exploration of post-training quantization (PTQ).... -
VFIQ: A Novel Model of ViT-FSIMc Hybrid Siamese Network for Image Quality Assessment
The Image Quality Assessment (IQA) is to measure how humans perceive the quality of images. In this paper, we propose a new model named for VFIQ – a... -
TON-ViT: A Neuro-Symbolic AI Based on Task Oriented Network with a Vision Transformer
The objective of this paper is to present a neuro-symbolic AI based technique to represent field-medicine knowledge, referred as to TON-ViT. TON-ViT...