![Loading...](https://link.springer.com/static/c4a417b97a76cc2980e3c25e2271af3129e08bbe/images/pdf-preview/spacer.gif)
-
Article
Open AccessBenefits of pre-trained mono- and cross-lingual speech representations for spoken language understanding of Dutch dysarthric speech
With the rise of deep learning, spoken language understanding (SLU) for command-and-control applications such as a voice-controlled virtual assistant can offer reliable hands-free operation to physically disab...
-
Article
Open AccessDecoding of the speech envelope from EEG using the VLAAI deep neural network
To investigate the processing of speech in the brain, commonly simple linear models are used to establish a relationship between brain signals and speech features. However, these linear models are ill-equipped...
-
Article
Open AccessMulti-encoder attention-based architectures for sound recognition with partial visual assistance
Large-scale sound recognition data sets typically consist of acoustic recordings obtained from multimedia libraries. As a consequence, modalities other than audio can often be exploited to improve the outputs ...
-
Chapter and Conference Paper
An Equal Data Setting for Attention-Based Encoder-Decoder and HMM/DNN Models: A Case Study in Finnish ASR
Standard end-to-end training of attention-based ASR models only uses transcribed speech. If they are compared to HMM/DNN systems, which additionally leverage a large corpus of text-only data and expert-crafted...
-
Article
Open AccessShow me where the action is!
Reality TV shows have gained popularity, motivating many production houses to bring new variants for us to watch. Compared to traditional TV shows, reality TV shows have spontaneous unscripted footage. Compute...
-
Chapter and Conference Paper
The CAMETRON Lecture Recording System: High Quality Video Recording and Editing with Minimal Human Supervision
In this paper, we demonstrate a system that automates the process of recording video lectures in classrooms. Through special hardware (lecturer and audience facing cameras and microphone arrays), we record mul...
-
Chapter and Conference Paper
Automatic Smoker Detection from Telephone Speech Signals
This paper proposes an automatic smoking habit detection from spontaneous telephone speech signals. In this method, each utterance is modeled using i-vector and non-negative factor analysis (NFA) frameworks, w...
-
Article
Open AccessThe self-taught vocal interface
Speech technology is firmly rooted in daily life, most notably in command-and-control (C&C) applications. C&C usability downgrades quickly, however, when used by people with non-standard speech. We pursue a fu...
-
Chapter and Conference Paper
Label Noise Robustness and Learning Speed in a Self-Learning Vocal User Interface
A self-learning vocal user interface learns to map user-defined spoken commands to intended actions. The voice user interface is trained by mining the speech input and the provoked action on a device. Although...
-
Chapter
Missing Data Solutions for Robust Speech Recognition
Current automatic speech recognisers rely for a great deal on statistical models learned from training data. When they are deployed in conditions that differ from those observed in the training data, the gener...
-
Chapter
The JASMIN Speech Corpus: Recordings of Children, Non-natives and Elderly People
Large speech corpora (LSC) constitute an indispensable resource for conducting research in speech processing and for develo** real-life speech applications. In 2004 the Spoken Dutch Corpus (Corpus Gesproken ...
-
Article
Open AccessMulti-candidate missing data imputation for robust speech recognition
The application of Missing Data Techniques (MDT) to increase the noise robustness of HMM/GMM-based large vocabulary speech recognizers is hampered by a large computational burden. The likelihood evaluations im...
-
Article
Human language technology and communicative disabilities: requirements and possibilities for the future
For some years now, the Nederlandse Taalunie (Dutch Language Union) has been active in promoting the development of human language technology (HLT) applications for speakers of Dutch with communicative disabil...
-
Chapter and Conference Paper
An On-Line NMF Model for Temporal Pattern Learning: Theory with Application to Automatic Speech Recognition
Convolutional non-negative matrix factorization (CNMF) can be used to discover recurring temporal (sequential) patterns in sequential vector non-negative data such as spectrograms or posteriorgrams. Drawbacks ...
-
Article
Sparse conjugate directions pursuit with application to fixed-size kernel models
This work studies an optimization scheme for computing sparse approximate solutions of over-determined linear systems. Sparse Conjugate Directions Pursuit (SCDP) aims to construct a solution using only a small...
-
Chapter and Conference Paper
Gaussian Selection Using Self-Organizing Map for Automatic Speech Recognition
The Self-Organizing Map (SOM) is widely applied for data clustering and visualization. In this paper, it is used to cluster Gaussians within the Hidden Markov Model (HMM) of the acoustic model for automatic sp...
-
Chapter
Automatic Speech Recognition Using Missing Data Techniques: Handling of Real-World Data
In this chapter, we investigate the performance of a missing data recognizer on real-world speech from the SPEECON and SpeechDat-Car databases. In previous work we hypothesized that in real-world speech, which...
-
Chapter and Conference Paper
On a Computational Model for Language Acquisition: Modeling Cross-Speaker Generalisation
The discovery of words by young infants involves two interrelated processes: (a) the detection of recurrent word-like acoustic patterns in the speech signal, and (b) cross-modal association between auditory an...
-
Article
Open AccessA Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition
The objective of this paper is threefold: (1) to provide an extensive review of signal subspace speech enhancement, (2) to derive an upper bound for the performance of these techniques, and (3) to present a co...