Search Results - Springer

Chapter and Conference Paper

End-to-End Streaming Customizable Keyword Spotting Based on Text-Adaptive Neural Search

Streaming keyword spotting (KWS) is an important technique for voice assistant wake-up. While KWS with a preset fixed keyword has been well studied, test-time customizable keyword spotting in streaming mode re...

Baochen Yang, Jiaqi Guo, Haoyu Li, Yu **, Qing Zhuo… in Man-Machine Speech Communication (2024)

Chapter and Conference Paper

3RE-Net: Joint Loss-REcovery and Super-REsolution Neural Network for REal-Time Video

Real-time video over the Internet suffers from packet loss and low network bandwidth. The receiving side may receive down-sampled video with damaged frames. In this work, we are motivated to enhance the qualit...

Liming Ge, David Zhaochen Jiang, Wei Bao in AI 2023: Advances in Artificial Intelligence (2024)

Chapter and Conference Paper

Preliminary Experiment for Measuring the Anxiety Level Using Heart Rate Variability

Anxiety is one of the most significant health issues. Generally, there are four levels of anxiety: mild anxiety, moderate anxiety, severe anxiety, and panic level anxiety

Haochen He, Chen Feng, Peeraya Sripian… in Virtual, Augmented and Mixed Reality (2023)

Chapter and Conference Paper

Power Efficient Video Super-Resolution on Mobile NPUs with Deep Learning, Mobile AI & AIM 2022 Challenge: Report

Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been pro...

Andrey Ignatov, Radu Timofte, Cheng-Ming Chiang… in Computer Vision – ECCV 2022 Workshops (2023)

Chapter and Conference Paper

Semantic Enhancement Framework for Robust Speech Recognition

Auto speech recognition (ASR) has been widely used in dialogue systems of various domains, performing as a crucial part of technology. Since the output of the ASR system will provide input to the subsequent sy...

Baochen Yang, Kai Yu in Man-Machine Speech Communication (2023)

Chapter and Conference Paper

A Fast Stain Normalization Network for Cervical Papanicolaou Images

The domain shift between different styles of stain images greatly challenges the generalization of computer-aided diagnosis (CAD) algorithms. To bridge the gap, color normalization is a prerequisite for most C...

Jiawei Cao, Changsheng Lu, Kaijie Wu, Chaochen Gu in Neural Information Processing (2023)

Chapter and Conference Paper

Single Cross-domain Semantic Guidance Network for Multimodal Unsupervised Image Translation

Multimodal image-to-image translation has received great attention due to its flexibility and practicality. The existing methods lack the generality of effective style representation, and cannot capture differ...

Jiaying Lan, Lianglun Cheng, Guoheng Huang, Chi-Man Pun… in MultiMedia Modeling (2023)

Chapter and Conference Paper

VERTEX: VEhicle Reconstruction and TEXture Estimation from a Single Image Using Deep Implicit Semantic Template Map**

We introduce VERTEX, an effective solution to recovering the 3D shape and texture of vehicles from uncalibrated monocular inputs under real-world street environments. To fully utilize the semantic prior of veh...

**aochen Zhao, Zerong Zheng, Chaonan Ji, Zhenyi Liu, Siyou Lin… in Artificial Intelligence (2022)

Chapter and Conference Paper

A Deep Attention Transformer Network for Pain Estimation with Facial Expression Video

Since pain often causes deformations in the facial structure, analysis of facial expressions has received considerable attention for automatic pain estimation in recent years. This study proposes a deep attent...

Haochen Xu, Manhua Liu in Biometric Recognition (2021)

Chapter and Conference Paper

Integrating Task Information into Few-Shot Classifier by Channel Attention

It has been increasingly recognized that meta-learning-based approaches provide a promising way to handle challenges to few-shot learning. In this paper, we incorporate the channel attention in the main framew...

Zhaochen Li, Kedian Mu in Knowledge Science, Engineering and Management (2021)

Chapter and Conference Paper

Stacked Sparse Autoencoder for Audio Object Coding

Compared with channel-based audio coding, the object-based audio coding has a definite advantage in meeting the user’s demands of personalized control. However, in the conventional Spatial Audio Object Coding ...

Yulin Wu, Ruimin Hu, **aochen Wang, Chenhao Hu, Gang Li in MultiMedia Modeling (2021)

Chapter and Conference Paper

A Metagraph-Based Model for Predicting Drug-Target Interaction on Heterogeneous Network

Determining drug-target interactions (DTIs) is an important task in drug discovery and drug relocalization. Currently, different models have been proposed to predict the potential interactions between drugs an...

Peng Ke, Yuqi Wen, Zhongnan Zhang, Song He… in Artificial Neural Networks and Machine Lea… (2021)

Chapter and Conference Paper

EMRM: Enhanced Multi-source Review-Based Model for Rating Prediction

Rating prediction, whose goal is to predict user preference for unconsumed items, has become one of the core tasks in recommendation systems. Recently, many deep learning-based methods have been applied to the...

**aochen Wang, Tingsong **ao, Jie Shao in Knowledge Science, Engineering and Management (2021)

Chapter and Conference Paper

Metric Learning for Categorical and Ambiguous Features: An Adversarial Method

Metric learning learns a distance metric from data and has significantly improved the classification accuracy of distance-based classifiers such as k-nearest neighbors. However, metric learning has rarely been ap...

**aochen Yang, Mingzhi Dong, Yiwen Guo… in Machine Learning and Knowledge Discovery i… (2021)

Chapter and Conference Paper

Multi-step Coding Structure of Spatial Audio Object Coding

The spatial audio object coding (SAOC) is an effective meth-od which compresses multiple audio objects and provides flexibility for personalized rendering in interactive services. It divides each frame signal ...

Chenhao Hu, Ruimin Hu, **aochen Wang, Tingzhao Wu, Dengshi Li in MultiMedia Modeling (2020)

Chapter and Conference Paper

Synthesizing Large-Scale Datasets for License Plate Detection and Recognition in the Wild

License Plate Detection and Recognition (LPDR) plays a key role in modern intelligent transportation systems. Recent state-of-the-art methods of LPDR are based on deep convolutional neural networks (DCNN), whi...

Chaochen Wang, Wenzhong Wang, Chenglong Li… in Pattern Recognition and Computer Vision (2020)

Chapter and Conference Paper

Imputation of Incomplete Data Based on Attribute Cross Fitting Model and Iterative Missing Value Variables

The problem of missing values is often encountered in tasks such as machine learning, and imputation of missing values has become an important research content in incomplete data analysis. In this paper, we p...

**chong Zhu, Liyong Zhang, **aochen Lai… in Advances in Neural Networks – ISNN 2020 (2020)

Chapter and Conference Paper

Perceptual Localization of Virtual Sound Source Based on Loudspeaker Triplet

When using a loudspeaker triplet for virtual sound localization, the traditional conversion method will result in inaccurate localization. In this paper, we constructed a perceptual localization distortion mod...

Duanzheng Guan, Dengshi Li, Xuebei Cai, **aochen Wang, Ruimin Hu in MultiMedia Modeling (2020)

Chapter and Conference Paper

HMM-Based Person Re-identification in Large-Scale Open Scenario

This paper aims to tackle person re-identification (person re-ID) in large-scale open scenario, which differs from the conventional person re-ID tasks but is significant for some real suspect investigation ca...

Dongyang Li, Ruimin Hu, Wenxin Huang, **aochen Wang, Dengshi Li… in MultiMedia Modeling (2020)

Chapter and Conference Paper

NormalGAN: Learning Detailed 3D Human from a Single RGB-D Image

We propose NormalGAN, a fast adversarial learning-based method to reconstruct the complete and detailed 3D human from a single RGB-D image. Given a single front-view RGB-D image, NormalGAN performs two steps: ...

Lizhen Wang, **aochen Zhao, Tao Yu, Songtao Wang, Yebin Liu in Computer Vision – ECCV 2020 (2020)

64 Result(s)

End-to-End Streaming Customizable Keyword Spotting Based on Text-Adaptive Neural Search

3RE-Net: Joint Loss-REcovery and Super-REsolution Neural Network for REal-Time Video

Preliminary Experiment for Measuring the Anxiety Level Using Heart Rate Variability

Power Efficient Video Super-Resolution on Mobile NPUs with Deep Learning, Mobile AI & AIM 2022 Challenge: Report

Semantic Enhancement Framework for Robust Speech Recognition

A Fast Stain Normalization Network for Cervical Papanicolaou Images

Single Cross-domain Semantic Guidance Network for Multimodal Unsupervised Image Translation

VERTEX: VEhicle Reconstruction and TEXture Estimation from a Single Image Using Deep Implicit Semantic Template Map**

A Deep Attention Transformer Network for Pain Estimation with Facial Expression Video

Integrating Task Information into Few-Shot Classifier by Channel Attention

Stacked Sparse Autoencoder for Audio Object Coding

A Metagraph-Based Model for Predicting Drug-Target Interaction on Heterogeneous Network

EMRM: Enhanced Multi-source Review-Based Model for Rating Prediction

Metric Learning for Categorical and Ambiguous Features: An Adversarial Method

Multi-step Coding Structure of Spatial Audio Object Coding

Synthesizing Large-Scale Datasets for License Plate Detection and Recognition in the Wild

Imputation of Incomplete Data Based on Attribute Cross Fitting Model and Iterative Missing Value Variables

Perceptual Localization of Virtual Sound Source Based on Loudspeaker Triplet

HMM-Based Person Re-identification in Large-Scale Open Scenario

NormalGAN: Learning Detailed 3D Human from a Single RGB-D Image

Our Content

Other Sites

Help & Contacts