Skip to main content

previous disabled Page of 2
and
  1. No Access

    Chapter and Conference Paper

    FastFoley: Non-autoregressive Foley Sound Generation Based on Visual Semantics

    Foley sound in movies and TV episodes is of great importance to bring a more realistic feeling to the audience. Traditionally, foley artists need to create the foley sound synchronous with the content occurrin...

    Sipan Li, Luwen Zhang, Chenyu Dong, Haiwei Xue in Man-Machine Speech Communication (2023)

  2. No Access

    Chapter

    Speech Recognition and Text-to-Speech Synthesis

    Automatic speech recognition (ASR) and text-to-speech (TTS) synthesis are two very important modules in human-computer communication. With the development of deep learning, the performance of ASR and TTS has i...

    Lifa Sun, Shiyin Kang, Xunying Liu, Helen Meng in Chinese Language Resources (2023)

  3. Article

    Open Access

    A phenomenographic approach on teacher conceptions of teaching Artificial Intelligence (AI) in K-12 schools

    Artificial intelligence (AI) education for K-12 students is an emerging necessity, owing to the rapid advancement and deployment of AI technologies. It is essential to take teachers’ perspectives into account ...

    King Woon Yau, C. S. CHAI, Thomas K. F. Chiu in Education and Information Technologies (2023)

  4. No Access

    Chapter and Conference Paper

    Overview of NLPCC 2022 Shared Task 7: Fine-Grained Dialogue Social Bias Measurement

    This paper presents the overview of the shared task 7, Fine-Grained Dialogue Social Bias Measurement, in NLPCC 2022. In this paper, we introduce the task, explain the construction of the provided dataset, anal...

    **gyan Zhou, Fei Mi, Helen Meng in Natural Language Processing and Chinese Co… (2022)

  5. No Access

    Chapter and Conference Paper

    Out-of-Scope Domain and Intent Classification through Hierarchical Joint Modeling

    User queries for a real-world dialog system may sometimes fall outside the scope of the system’s capabilities, but appropriate system responses will enable smooth processing throughout the human-computer inter...

    Pengfei Liu, Kun Li, Helen Meng in Conversational AI for Natural Human-Centric Interaction (2022)

  6. No Access

    Chapter and Conference Paper

    Learning Frame-Level Recurrent Neural Networks Representations for Query-by-Example Spoken Term Detection on Mobile Devices

    Recurrent neural networks (RNNs) with long short term memory (LSTM) acoustic model (AM) has achieved state-of-the-art performance in LVCSR. The strong ability in capturing context information makes the acoust...

    Ziwei Zhu, Zhiyong Wu, Runnan Li in Artificial Intelligence and Mobile Service… (2018)

  7. Article

    Preface

    Péter Baranyi, Hassan Charaf, Anna Esposito in Journal on Multimodal User Interfaces (2015)

  8. Article

    Expressive talking avatar synthesis and animation

    Lei **e, Jia Jia, Helen Meng, Zhigang Deng in Multimedia Tools and Applications (2015)

  9. No Access

    Article

    Generating emphatic speech with hidden Markov model for expressive speech synthesis

    Emphasis plays an important role in expressive speech synthesis in highlighting the focus of an utterance to draw the attention of the listener. As there are only a few emphasized words in a sentence, the prob...

    Zhiyong Wu, Yishuang Ning, **ao Zang, Jia Jia in Multimedia Tools and Applications (2015)

  10. No Access

    Article

    Acoustic to articulatory map** with deep neural network

    Synthetic talking avatar has been demonstrated to be very useful in human-computer interactions. In this paper, we discuss the problem of acoustic to articulatory map** and explore different kinds of models ...

    Zhiyong Wu, Kai Zhao, **xin Wu, **nyu Lan, Helen Meng in Multimedia Tools and Applications (2015)

  11. No Access

    Article

    Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training

    Emphasis plays an important role in expressive speech synthesis in highlighting the focus of an utterance to draw the attention of the listener. We present a hidden Markov model (HMM)-based emphatic speech syn...

    Fanbo Meng, Zhiyong Wu, Jia Jia, Helen Meng in Multimedia Tools and Applications (2014)

  12. No Access

    Chapter

    Audiovisual Tools for Phonetic and Articulatory Visualization in Computer-Aided Pronunciation Training

    This paper reviews interactive methods for improving the phonetic competence of subjects in the case of second language learning as well as in the case of speech therapy for subjects suffering from hearing-imp...

    Bernd J. Kröger, Peter Birkholz in Development of Multimodal Interfaces: Acti… (2010)

  13. No Access

    Chapter and Conference Paper

    Initial Experiments on Automatic Story Segmentation in Chinese Spoken Documents Using Lexical Cohesion of Extracted Named Entities

    Story segmentation plays a critical role in spoken document processing. Spoken documents often come in a continuous audio stream without explicit boundaries related to stories or topics. It is important to be ...

    Devon Li, Wai-Kit Lo, Helen Meng in Chinese Spoken Language Processing (2006)

  14. No Access

    Chapter and Conference Paper

    A Corpus-Based Approach for Cooperative Response Generation in a Dialog System

    This paper presents a corpus-based approach for cooperative response generation in a spoken dialog system for the Hong Kong tourism domain. A corpus with 3874 requests and responses is collected using Wizard-o...

    Zhiyong Wu, Helen Meng, Hui Ning, Sam C. Tse in Chinese Spoken Language Processing (2006)

  15. No Access

    Chapter and Conference Paper

    A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion

    This paper proposes a novel approach towards a video- realistic, speech-driven talking face for Cantonese. We present a technique that realizes a talking face for a target language (Cantonese) using only audio...

    Lei **e, Helen Meng, Zhi-Qiang Liu in Chinese Spoken Language Processing (2006)

  16. No Access

    Book and Conference Proceedings

    Information Retrieval Technology

    Second Asia Information Retrieval Symposium, AIRS 2005, Jeju Island, Korea, October 13-15, 2005. Proceedings

    Gary Geunbae Lee, Akio Yamada, Helen Meng in Lecture Notes in Computer Science (2005)

  17. Chapter and Conference Paper

    Multi-level Fusion of Audio and Visual Features for Speaker Identification

    This paper explores the fusion of audio and visual evidences through a multi-level hybrid fusion architecture based on dynamic Bayesian network (DBN), which combines model level and decision level fusion to ac...

    Zhiyong Wu, Lianhong Cai, Helen Meng in Advances in Biometrics (2005)

  18. No Access

    Chapter and Conference Paper

    Using Verb Dependency Matching in a Reading Comprehension System

    In this paper, we describe a reading comprehension system. This system can return a sentence in a given document as the answer to a given question. This system applies bag-of-words matching approach as the bas...

    Kui Xu, Helen Meng in Information Retrieval Technology (2005)

  19. No Access

    Chapter and Conference Paper

    A Pruning Approach for GMM-Based Speaker Verification in Mobile Embedded Systems

    This paper presents a pruning approach for minimizing the execution time in the pattern matching process during speaker verification. Specifically, our speaker verification system uses mel-frequency cepstral c...

    Cheung Chi Leung, Yiu Sang Moon, Helen Meng in Biometric Authentication (2004)

  20. No Access

    Chapter

    A Hierarchical Lexical Representation for Pronunciation Generation

    We propose a unified framework for integrating a variety of linguistic knowledge sources for representing the English word, to facilitate their concurrent utilization in language applications. Our hierarchical...

    Helen Meng in Data-Driven Techniques in Speech Synthesis (2001)

previous disabled Page of 2