Search Results - Springer

Sort By Newest First Oldest First

Chapter and Conference Paper

MaskMel-Prosody-CycleGAN-VC: High-Quality Cross-Lingual Voice Conversion

Voice conversion aims to change the timber of the source speaker to that of the target speaker without changing the speech content. The cross-lingual voice conversion requires non-parallel training data in two...

Siqi Yan, Senda Chen, Yanyan Xu, Dengfeng Ke in Proceedings of 3rd International Conferenc… (2024)
Chapter and Conference Paper

Fine-Grained Style Control in VITS-Based Text-to-Speech Synthesis

In this paper, a fine-grained style controllable speech synthesis model based on VITS is presented. To achieve fine-grained emotional speech, global and local emotion features are extracted using GST and LST, ...

Zhong Huihang, Dengfeng Ke, Li Ya, Wenhan Yao, Wenqian Bao in Computer Applications (2024)
Article

Open Access

Three-stage training and orthogonality regularization for spoken language recognition

Spoken language recognition has made significant progress in recent years, for which automatic speech recognition has been used as a parallel branch to extract phonetic features. However, there is still a lack...

Zimu Li, Yanyan Xu, Dengfeng Ke, Kaile Su in EURASIP Journal on Audio, Speech, and Musi… (2023)

Download PDF (2751 KB) View Article
Article

Multi-domain Attention Fusion Network For Language Recognition

Attention-based convolutional neural network models are increasingly adopted for language recognition tasks. In this paper, based on the self-attention mechanism, we solve the study of language recognition by ...

Minghang Ju, Yanyan Xu, Dengfeng Ke, Kaile Su in SN Computer Science (2022)
Article

Open Access

Masked multi-center angular margin loss for language recognition

Language recognition based on embedding aims to maximize inter-class variance and minimize intra-class variance. Previous researches are limited to the training constraint of a single centroid, which cannot ac...

Minghang Ju, Yanyan Xu, Dengfeng Ke… in EURASIP Journal on Audio, Speech, and Musi… (2022)

Download PDF (7252 KB) View Article
Chapter and Conference Paper

WINVC: One-Shot Voice Conversion with Weight Adaptive Instance Normalization

This paper proposes a one-shot voice conversion (VC) solution. In many one-shot voice conversion solutions (e.g., Auto-encoder-based VC methods), performances have dramatically been improved due to instance no...

Shengjie Huang, Mingjie Chen, Yanyan Xu… in PRICAI 2021: Trends in Artificial Intellig… (2021)
Article

Trainable back-propagated functional transfer matrices

Functional transfer matrices consist of real functions with trainable parameters. In this work, functional transfer matrices are used to model functional connections in neural networks. Different from linear c...

Cheng-Hao Cai, Yanyan Xu, Dengfeng Ke, Kaile Su, **g Sun in Applied Intelligence (2019)
Chapter and Conference Paper

Fast Learning of Deep Neural Networks via Singular Value Decomposition

In this paper, we propose a new fast training methodology for learning of Deep Neural Networks (DNNs) via Singular Value Decomposition (SVD). The fast training methodology uses a supervised pre-adjusting proce...

Chenghao Cai, Dengfeng Ke, Yanyan Xu… in PRICAI 2014: Trends in Artificial Intellig… (2014)
Chapter and Conference Paper

Punctuation Prediction for Chinese Spoken Sentence Based on Model Combination

Punctuation prediction is very important for automatic speech recognition (ASR). It greatly improves the readability of transcripts and user experience, and facilitates following natural language processing ta...

**ao Chen, Dengfeng Ke, Bo Xu in Practical Applications of Intelligent Systems (2014)
Chapter and Conference Paper

Compact WFSA Based Language Model and Its Application in Statistical Machine Translation

The authors explore the fast query techniques for n-gram language model (LM) in statistical machine translation (SMT), and then propose a compact WFSA (weighted finite-state automaton) based LM motivated by the c...

**aoyin Fu, Wei Wei, Shixiang Lu… in Natural Language Processing and Chinese Co… (2012)

10 Result(s)

MaskMel-Prosody-CycleGAN-VC: High-Quality Cross-Lingual Voice Conversion

Fine-Grained Style Control in VITS-Based Text-to-Speech Synthesis

Three-stage training and orthogonality regularization for spoken language recognition

Multi-domain Attention Fusion Network For Language Recognition

Masked multi-center angular margin loss for language recognition

WINVC: One-Shot Voice Conversion with Weight Adaptive Instance Normalization

Trainable back-propagated functional transfer matrices

Fast Learning of Deep Neural Networks via Singular Value Decomposition

Punctuation Prediction for Chinese Spoken Sentence Based on Model Combination

Compact WFSA Based Language Model and Its Application in Statistical Machine Translation

Our Content

Other Sites

Help & Contacts