-
Article
Spatially-Varying Illumination-Aware Indoor Harmonization
In this paper, we address the problem of spatially-varying illumination-aware indoor harmonization. Existing image harmonization works either focus on extracting no more than 2D information (e.g., low-level st...
-
Article
ZMNet: feature fusion and semantic boundary supervision for real-time semantic segmentation
Feature fusion module is an essential component of real-time semantic segmentation networks to bridge the semantic gap among different feature layers. However, many networks are inefficient in multi-level feat...
-
Article
ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection
With the rapid development of depth sensor, more and more RGB-D videos could be obtained. Identifying the foreground in RGB-D videos is a fundamental and important task. However, the existing salient object de...
-
Article
Open AccessJoint training with local soft attention and dual cross-neighbor label smoothing for unsupervised person re-identification
Existing unsupervised person re-identification approaches fail to fully capture the fine-grained features of local regions, which can result in people with similar appearances and different identities being as...
-
Article
Offline handwritten mathematical expression recognition based on YOLOv5s
The error accumulation in traditional offline handwritten mathematical expression recognition (OHMER) becomes challenging, because of the two-dimensional structure and writing arbitrariness of offline handwrit...
-
Article
Machine reading comprehension model based on query reconstruction technology and deep learning
Machine reading comprehension is introduced to improve machines’ readability and understandability of human languages. This sophisticated version of natural language processing is used for testing and improvin...
-
Chapter and Conference Paper
Joint Training or Not: An Exploration of Pre-trained Speech Models in Audio-Visual Speaker Diarization
The scarcity of labeled audio-visual datasets is a constraint for training superior audio-visual speaker diarization systems. To improve the performance of audio-visual speaker diarization, we leverage pre-tra...
-
Article
Towards High-Resolution Specular Highlight Detection
Specular highlight detection is an essential task with various applications in computer vision. This paper aims to detect specular highlights in single high-resolution images using deep learning while avoiding...
-
Article
ACKSNet: adaptive center keypoint selection for object detection
Keypoint-based detectors generate a large number of false positives due to incorrect keypoint matching in the object detection task. In this paper, we propose an adaptive center keypoint selection method (ACKS...
-
Article
Cluster-based two-branch framework for point cloud attribute compression
Owing to the irregular distribution of point clouds in 3D space, effectively compressing the point cloud is still challenging. Recently, numerous compression methods have been developed with outstanding perfor...
-
Article
ZRDNet: zero-reference image defogging by physics-based decomposition–reconstruction mechanism and perception fusion
This paper investigates challenging fully unsupervised defogging problems, i.e., how to remove fog by feeding only foggy images in deep neural networks rather than using paired or unpaired synthetic images, an...
-
Article
Trade-off background joint learning for unsupervised vehicle re-identification
Existing vehicle re-identification (Re-ID) methods either extract valuable background information to enhance the robustness of the vehicle model or segment background interference information to learn vehicle ...
-
Chapter and Conference Paper
CrowdFusion: Refined Cross-Modal Fusion Network for RGB-T Crowd Counting
Crowd counting is a crucial task in computer vision, offering numerous applications in smart security, remote sensing, agriculture and forestry. While pure image-based models have made significant advancements...
-
Chapter and Conference Paper
MatchFormer: Interleaving Attention in Transformers for Feature Matching
Local feature matching is a computationally intensive task at the subpixel level. While detector-based methods coupled with feature descriptors struggle in low-texture scenes, CNN-based methods with a sequential
-
Article
Countering Malicious DeepFakes: Survey, Battleground, and Horizon
The creation or manipulation of facial appearance through deep generative approaches, known as DeepFake, have achieved significant progress and promoted a wide range of benign and malicious applications, e.g., vi...
-
Article
HybNet: a hybrid network structure for pain intensity estimation
Automatic pain intensity estimation has great potential in current rehabilitation medicine, and patients’ health status information can be obtained through the analysis of facial images. At present, deep convo...
-
Chapter and Conference Paper
Prognostic Staging System for Esophageal Cancer Using Lasso, Cox and CS-SVM
Esophageal cancer is a heterogeneous malignant tumor with high mortality. Design constructing an effective prognostic staging system would help to improve the prognosis of patients. In this paper, blood indexe...
-
Chapter and Conference Paper
An Adaptive Weight Joint Loss Optimization for Dog Face Recognition
In recent years, the field of human face recognition has developed rapidly, and a large number of deep learning methods have proven their efficiency in human face recognition. However, these methods do not wor...
-
Chapter and Conference Paper
Lightweight Image Compression Based on Deep Learning
Deep learning based image compression (DLIC) algorithms have achieved higher compression gain than conventional algorithms. However, the large parameters and float-point operations (FLOPs) of DLIC severely lim...
-
Chapter and Conference Paper
End-to-End Large-Scale Image Retrieval Network with Convolution and Vision Transformers
There has been significant progress in content-based image retrieval with the development of convolutional neural networks and visual transformers. However, there are semantic gaps between high-level semantic ...