Search Results - Springer

Sort By Newest First Oldest First

Chapter and Conference Paper

FastFoley: Non-autoregressive Foley Sound Generation Based on Visual Semantics

Foley sound in movies and TV episodes is of great importance to bring a more realistic feeling to the audience. Traditionally, foley artists need to create the foley sound synchronous with the content occurrin...

Sipan Li, Luwen Zhang, Chenyu Dong, Haiwei Xue… in Man-Machine Speech Communication (2023)
Chapter

Speech Recognition and Text-to-Speech Synthesis

Automatic speech recognition (ASR) and text-to-speech (TTS) synthesis are two very important modules in human-computer communication. With the development of deep learning, the performance of ASR and TTS has i...

Lifa Sun, Shiyin Kang, Xunying Liu, Helen Meng in Chinese Language Resources (2023)
Article

Open Access

A phenomenographic approach on teacher conceptions of teaching Artificial Intelligence (AI) in K-12 schools

Artificial intelligence (AI) education for K-12 students is an emerging necessity, owing to the rapid advancement and deployment of AI technologies. It is essential to take teachers’ perspectives into account ...

King Woon Yau, C. S. CHAI, Thomas K. F. Chiu… in Education and Information Technologies (2023)

Download PDF (1234 KB) View Article
Chapter and Conference Paper

Overview of NLPCC 2022 Shared Task 7: Fine-Grained Dialogue Social Bias Measurement

This paper presents the overview of the shared task 7, Fine-Grained Dialogue Social Bias Measurement, in NLPCC 2022. In this paper, we introduce the task, explain the construction of the provided dataset, anal...

**gyan Zhou, Fei Mi, Helen Meng… in Natural Language Processing and Chinese Co… (2022)
Chapter and Conference Paper

Out-of-Scope Domain and Intent Classification through Hierarchical Joint Modeling

User queries for a real-world dialog system may sometimes fall outside the scope of the system’s capabilities, but appropriate system responses will enable smooth processing throughout the human-computer inter...

Pengfei Liu, Kun Li, Helen Meng in Conversational AI for Natural Human-Centric Interaction (2022)
Chapter and Conference Paper

Learning Frame-Level Recurrent Neural Networks Representations for Query-by-Example Spoken Term Detection on Mobile Devices

Recurrent neural networks (RNNs) with long short term memory (LSTM) acoustic model (AM) has achieved state-of-the-art performance in LVCSR. The strong ability in capturing context information makes the acoust...

Ziwei Zhu, Zhiyong Wu, Runnan Li… in Artificial Intelligence and Mobile Service… (2018)
Article

Preface

Péter Baranyi, Hassan Charaf, Anna Esposito… in Journal on Multimodal User Interfaces (2015)

Download PDF (304 KB) View Article
Article

Expressive talking avatar synthesis and animation

Lei **e, Jia Jia, Helen Meng, Zhigang Deng… in Multimedia Tools and Applications (2015)

Download PDF (85 KB) View Article
Article

Generating emphatic speech with hidden Markov model for expressive speech synthesis

Emphasis plays an important role in expressive speech synthesis in highlighting the focus of an utterance to draw the attention of the listener. As there are only a few emphasized words in a sentence, the prob...

Zhiyong Wu, Yishuang Ning, **ao Zang, Jia Jia… in Multimedia Tools and Applications (2015)
Article

Acoustic to articulatory map** with deep neural network

Synthetic talking avatar has been demonstrated to be very useful in human-computer interactions. In this paper, we discuss the problem of acoustic to articulatory map** and explore different kinds of models ...

Zhiyong Wu, Kai Zhao, **xin Wu, **nyu Lan, Helen Meng in Multimedia Tools and Applications (2015)
Article

Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training

Emphasis plays an important role in expressive speech synthesis in highlighting the focus of an utterance to draw the attention of the listener. We present a hidden Markov model (HMM)-based emphatic speech syn...

Fanbo Meng, Zhiyong Wu, Jia Jia, Helen Meng… in Multimedia Tools and Applications (2014)
Chapter

Audiovisual Tools for Phonetic and Articulatory Visualization in Computer-Aided Pronunciation Training

This paper reviews interactive methods for improving the phonetic competence of subjects in the case of second language learning as well as in the case of speech therapy for subjects suffering from hearing-imp...

Bernd J. Kröger, Peter Birkholz… in Development of Multimodal Interfaces: Acti… (2010)
Chapter and Conference Paper

Initial Experiments on Automatic Story Segmentation in Chinese Spoken Documents Using Lexical Cohesion of Extracted Named Entities

Story segmentation plays a critical role in spoken document processing. Spoken documents often come in a continuous audio stream without explicit boundaries related to stories or topics. It is important to be ...

Devon Li, Wai-Kit Lo, Helen Meng in Chinese Spoken Language Processing (2006)
Chapter and Conference Paper

A Corpus-Based Approach for Cooperative Response Generation in a Dialog System

This paper presents a corpus-based approach for cooperative response generation in a spoken dialog system for the Hong Kong tourism domain. A corpus with 3874 requests and responses is collected using Wizard-o...

Zhiyong Wu, Helen Meng, Hui Ning, Sam C. Tse in Chinese Spoken Language Processing (2006)
Chapter and Conference Paper

A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion

This paper proposes a novel approach towards a video- realistic, speech-driven talking face for Cantonese. We present a technique that realizes a talking face for a target language (Cantonese) using only audio...

Lei **e, Helen Meng, Zhi-Qiang Liu in Chinese Spoken Language Processing (2006)
Book and Conference Proceedings

Information Retrieval Technology

Second Asia Information Retrieval Symposium, AIRS 2005, Jeju Island, Korea, October 13-15, 2005. Proceedings

Gary Geunbae Lee, Akio Yamada, Helen Meng… in Lecture Notes in Computer Science (2005)
Chapter and Conference Paper

Multi-level Fusion of Audio and Visual Features for Speaker Identification

This paper explores the fusion of audio and visual evidences through a multi-level hybrid fusion architecture based on dynamic Bayesian network (DBN), which combines model level and decision level fusion to ac...

Zhiyong Wu, Lianhong Cai, Helen Meng in Advances in Biometrics (2005)

Download PDF (200 KB)
Chapter and Conference Paper

Using Verb Dependency Matching in a Reading Comprehension System

In this paper, we describe a reading comprehension system. This system can return a sentence in a given document as the answer to a given question. This system applies bag-of-words matching approach as the bas...

Kui Xu, Helen Meng in Information Retrieval Technology (2005)
Chapter and Conference Paper

A Pruning Approach for GMM-Based Speaker Verification in Mobile Embedded Systems

This paper presents a pruning approach for minimizing the execution time in the pattern matching process during speaker verification. Specifically, our speaker verification system uses mel-frequency cepstral c...

Cheung Chi Leung, Yiu Sang Moon, Helen Meng in Biometric Authentication (2004)
Chapter

A Hierarchical Lexical Representation for Pronunciation Generation

We propose a unified framework for integrating a variety of linguistic knowledge sources for representing the English word, to facilitate their concurrent utilization in language applications. Our hierarchical...

Helen Meng in Data-Driven Techniques in Speech Synthesis (2001)

21 Result(s)

FastFoley: Non-autoregressive Foley Sound Generation Based on Visual Semantics

Speech Recognition and Text-to-Speech Synthesis

A phenomenographic approach on teacher conceptions of teaching Artificial Intelligence (AI) in K-12 schools

Overview of NLPCC 2022 Shared Task 7: Fine-Grained Dialogue Social Bias Measurement

Out-of-Scope Domain and Intent Classification through Hierarchical Joint Modeling

Learning Frame-Level Recurrent Neural Networks Representations for Query-by-Example Spoken Term Detection on Mobile Devices

Preface

Expressive talking avatar synthesis and animation

Generating emphatic speech with hidden Markov model for expressive speech synthesis

Acoustic to articulatory map** with deep neural network

Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training

Audiovisual Tools for Phonetic and Articulatory Visualization in Computer-Aided Pronunciation Training

Initial Experiments on Automatic Story Segmentation in Chinese Spoken Documents Using Lexical Cohesion of Extracted Named Entities

A Corpus-Based Approach for Cooperative Response Generation in a Dialog System

A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion

Information Retrieval Technology

Multi-level Fusion of Audio and Visual Features for Speaker Identification

Using Verb Dependency Matching in a Reading Comprehension System

A Pruning Approach for GMM-Based Speaker Verification in Mobile Embedded Systems

A Hierarchical Lexical Representation for Pronunciation Generation

Our Content

Other Sites

Help & Contacts