-
Chapter and Conference Paper
MaskMel-Prosody-CycleGAN-VC: High-Quality Cross-Lingual Voice Conversion
Voice conversion aims to change the timber of the source speaker to that of the target speaker without changing the speech content. The cross-lingual voice conversion requires non-parallel training data in two...
-
Chapter and Conference Paper
Fine-Grained Style Control in VITS-Based Text-to-Speech Synthesis
In this paper, a fine-grained style controllable speech synthesis model based on VITS is presented. To achieve fine-grained emotional speech, global and local emotion features are extracted using GST and LST, ...
-
Article
Open AccessThree-stage training and orthogonality regularization for spoken language recognition
Spoken language recognition has made significant progress in recent years, for which automatic speech recognition has been used as a parallel branch to extract phonetic features. However, there is still a lack...
-
Article
Multi-domain Attention Fusion Network For Language Recognition
Attention-based convolutional neural network models are increasingly adopted for language recognition tasks. In this paper, based on the self-attention mechanism, we solve the study of language recognition by ...
-
Article
Open AccessMasked multi-center angular margin loss for language recognition
Language recognition based on embedding aims to maximize inter-class variance and minimize intra-class variance. Previous researches are limited to the training constraint of a single centroid, which cannot ac...
-
Chapter and Conference Paper
WINVC: One-Shot Voice Conversion with Weight Adaptive Instance Normalization
This paper proposes a one-shot voice conversion (VC) solution. In many one-shot voice conversion solutions (e.g., Auto-encoder-based VC methods), performances have dramatically been improved due to instance no...
-
Article
Trainable back-propagated functional transfer matrices
Functional transfer matrices consist of real functions with trainable parameters. In this work, functional transfer matrices are used to model functional connections in neural networks. Different from linear c...
-
Chapter and Conference Paper
Fast Learning of Deep Neural Networks via Singular Value Decomposition
In this paper, we propose a new fast training methodology for learning of Deep Neural Networks (DNNs) via Singular Value Decomposition (SVD). The fast training methodology uses a supervised pre-adjusting proce...
-
Chapter and Conference Paper
Punctuation Prediction for Chinese Spoken Sentence Based on Model Combination
Punctuation prediction is very important for automatic speech recognition (ASR). It greatly improves the readability of transcripts and user experience, and facilitates following natural language processing ta...
-
Chapter and Conference Paper
Compact WFSA Based Language Model and Its Application in Statistical Machine Translation
The authors explore the fast query techniques for n-gram language model (LM) in statistical machine translation (SMT), and then propose a compact WFSA (weighted finite-state automaton) based LM motivated by the c...