Search Page | SpringerLink

GFPE-ViT: vision transformer with geometric-fractal-based position encoding

In recent years, transformers have become a significant tool in computer vision, revolutionizing fundamental tasks. This paper focuses on the map**...

Lei Wang, Xue-song Tang, Kuangrong Hao in The Visual Computer

Article 17 April 2024

Spatiotemporal Representation Enhanced ViT for Video Recognition

Vision Transformers (ViTs) are promising for solving video-related tasks, but often suffer from computational bottlenecks or insufficient temporal...

Min Li, Fengfa Li, ... Chenghua Gao in MultiMedia Modeling

Conference paper 2024

ViT-PGC: vision transformer for pedestrian gender classification on small-size dataset

Pedestrian gender classification (PGC) is a key task in full-body-based pedestrian image analysis and has become an important area in applications...

Farhat Abbas, Mussarat Yasmin, ... Usman Asim in Pattern Analysis and Applications

Article 26 September 2023

Add-Vit: CNN-Transformer Hybrid Architecture for Small Data Paradigm Processing

The vision transformer(ViT), pre-trained on large datasets, outperforms convolutional neural networks (CNN) in computer vision(CV). However, if not...

**hui Chen, Peng Wu, ... Jia Liang in Neural Processing Letters

Article Open access 07 June 2024

Conv-ViT fusion for improved handwritten Arabic character classification

An essential aspect of pattern recognition pertains to handwriting recognition, particularly in languages with diverse character styles like Arabic....

Sarra Rouabhi, Abdennour Azerine, ... Lhassane Idoumghar in Signal, Image and Video Processing

Article 29 April 2024

Occluded pedestrian re-identification via Res-ViT double-branch hybrid network

Existing occluded pedestrian re-identification methods mainly utilize convolutional neural networks to realize the feature matching under different...

Yunbin Zhao, Songhao Zhu in Multimedia Systems

Article 12 January 2024

Hybrid CNN-ViT architecture to exploit spatio-temporal feature for fire recognition trained through transfer learning

Fires are becoming one of the major natural hazards that threaten the ecology, economy, human life and even more worldwide. Therefore, early fire...

Mohammad Shahid, Hong-Cyuan Wang, ... Kai-Lung Hua in Multimedia Tools and Applications

Article 25 March 2024

ViT-DAE: Transformer-Driven Diffusion Autoencoder for Histopathology Image Analysis

Generative AI has received substantial attention in recent years due to its ability to synthesize data that closely resembles the original data...

Xuan Xu, Saarthak Kapse, ... Prateek Prasanna in Deep Generative Models

Conference paper 2024

ViT-MPI: Vision Transformer Multiplane Images for Surgical Single-View View Synthesis

In this paper, we explore the use of a single imaging device to acquire immersive 3D perception in endoscopic surgery. To solve the heavily ill-posed...

Chenming Han, Ruizhi Shao, ... Yebin Liu in Artificial Intelligence

Conference paper 2024

Vision transformers (ViT) and deep convolutional neural network (D-CNN)-based models for MRI brain primary tumors images multi-classification supported by explainable artificial intelligence (XAI)

The manual classification of primary brain tumors through Magnetic Resonance Imaging (MRI) is considered as a critical task during the clinical...

Hiba Mzoughi, Ines Njeh, ... Chokri Mhiri in The Visual Computer

Article 26 June 2024

Enhancing Cell Detection in Histopathology Images: A ViT-Based U-Net Approach

Cell detection in histology images is a pivotal and fundamental task within the field of computational pathology. Recent advancements have led to the...

Zhaoyang Li, Wangkai Li, ... Zhiwei **ong in Graphs in Biomedical Image Analysis, and Overlapped Cell on Tissue Dataset for Histopathology

Conference paper 2024

YOLO-based CAD framework with ViT transformer for breast mass detection and classification in CESM and FFDM images

Breast cancer detection is considered a challenging task for the average experienced radiologist due to the variation of the lesions’ size and shape,...

Nada M. Hassan, Safwat Hamad, Khaled Mahar in Neural Computing and Applications

Article Open access 16 January 2024

Multimodal Learning for Road Safety Using Vision Transformer ViT

This paper proposes a novel approach for multimodal learning that combines visual information from images with structured data from a multi-column...

Asmae Rhanizar, Zineb El Akkaoui in New Technologies, Artificial Intelligence and Smart Data

Conference paper 2024

ViT-Siamese Cascade Network for Transmission Image Deduplication

With the large-scale use of various inspection methods such as drones, helicopters, and robots, the generated power inspection images have increased...

Zhenyu Chen, Siyu Chen, ... **aoyu Zhang in Digital Multimedia Communications

Conference paper 2023

Improved Image Captioning Using GAN and ViT

Encoder-decoder architectures are widely used in solving image captioning applications. Convolutional encoders and recurrent decoders are prominently...

Vrushank D. Rao, B. N. Shashank, S. Nagesh Bhattu in Computer Vision and Image Processing

Conference paper 2024

On the Effectiveness of ViT Features as Local Semantic Descriptors

We study the use of deep features extracted from a pre-trained Vision Transformer (ViT) as dense visual descriptors. We observe and empirically...

Shir Amir, Yossi Gandelsman, ... Tali Dekel in Computer Vision – ECCV 2022 Workshops

Conference paper 2023

Latent Diffusion Model-Based T2T-ViT for SAR Ship Classification

Recently, deep learning methods have been applied to ship classification in Synthetic Aperture Radar (SAR) images. However, because of the problem of...

Yuhang Qi, Lu Wang, ... Chunhui Zhao in Computer Supported Cooperative Work and Social Computing

Conference paper 2024

FGPTQ-ViT: Fine-Grained Post-training Quantization for Vision Transformers

The complex architecture and high training cost of Vision Transformers (ViTs) have prompted the exploration of post-training quantization (PTQ)....

Caihua Liu, Hongyang Shi, **nyu He in Pattern Recognition and Computer Vision

Conference paper 2024

VFIQ: A Novel Model of ViT-FSIMc Hybrid Siamese Network for Image Quality Assessment

The Image Quality Assessment (IQA) is to measure how humans perceive the quality of images. In this paper, we propose a new model named for VFIQ – a...

Junrong Huang, Chenwei Wang in Neural Information Processing

Conference paper 2024

TON-ViT: A Neuro-Symbolic AI Based on Task Oriented Network with a Vision Transformer

The objective of this paper is to present a neuro-symbolic AI based technique to represent field-medicine knowledge, referred as to TON-ViT. TON-ViT...

Yupeng Zhuo, Nina Jiang, ... Juan Wachs in Medical Image Understanding and Analysis

Conference paper 2024

Search

Filters

Search Results

Search

Navigation