Search
Search Results
-
Speech recognition model design for Sundanese language using WAV2VEC 2.0
Indonesia has a variety of languages, one of which is Sundanese. Sundanese is a regional language from Indonesia that has the potential to become...
-
Wav2vec-AD: Acoustic Unit Discovery Module-Integrated, Self-Supervised Contrastive Pre-training Approach for Speech Recognition
An effective speech recognition model necessitates an ample supply of labeled data for supervised training. However, this proposition poses a...
-
Improving Automatic Speech Recognition for Non-native English with Transfer Learning and Language Model Decoding
ASR systems designed for native English (L1) usually underperform on non-native English (L2). To address this performance gap, (1) we extend our... -
Self-supervised Learning for Speech Emotion Recognition Task Using Audio-visual Features and Distil Hubert Model on BAVED and RAVDESS Databases
Existing pre-trained models like Distil HuBERT excel at uncovering hidden patterns and facilitating accurate recognition across diverse data types,...
-
Speech Emotion Recognition Using Global-Aware Cross-Modal Feature Fusion Network
Speech emotion recognition (SER) facilitates better interpersonal communication. Emotion is normally present in conversation in many forms, such as... -
Decoding speech perception from non-invasive brain recordings
Decoding speech from brain activity is a long-awaited goal in both healthcare and neuroscience. Invasive devices have recently led to major...
-
Automatic Speech Recognition of Finnish-Swedish Dialects: A Comparison of Three Cutting-Edge Technologies
This paper explores the performance of two different automatic speech recognition models for the Finnish-Swedish language. The first model, Whisper... -
Multimodal Recommendation Engine for Advertising Using Object Detection and Natural Language Processing
In today's world, there is an explosion in online advertising due to high levels of activity of users online. With this comes the intrinsic issue of... -
Quality Assurance for Speech Synthesis with ASR
Autoregressive TTS models are still widely used. Due to their stochastic nature, the output may vary from very good to completely unusable from one... -
End-to-end ASR framework for Indian-English accent: using speech CNN-based segmentation
The superiority of Automatic Speech Recognition (ASR) has significantly enhanced over time, with a focus from short utterance circumstances to longer...
-
Adaptive Keyword Extraction Service for Turkish
Keyword extraction is one of the important tasks of NLP. In this study, we have implemented a fast and competitive keyword extraction model for... -
Exploration of Whisper fine-tuning strategies for low-resource ASR
Limited data availability remains a significant challenge for Whisper’s low-resource speech recognition performance, falling short of practical...
-
English Pronunciation Correction Service for Hearing-Impaired People: BETTer, Focusing on the Personalized Speech Model
There has been a growing body of research that explores the need for hearing impaired people. However, English Education aids for the... -
A Novel Approach to Video Summarization Using AI-GPT and Speech Recognition
In an era where online video data is exploding, there is a growing need for efficient ways to summarize video content. In this paper, a novel... -
Cross-Lingual Self-training to Learn Multilingual Representation for Low-Resource Speech Recognition
Representation learning or pre-training has shown promising performance for low-resource speech recognition which suffers from the data shortage....
-
ASR Bundestag: A Large-Scale Political Debate Dataset in German
We present ASR Bundestag, a dataset for automatic speech recognition in German, consisting of 610 h of aligned audio-transcript pairs for supervised... -
Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources
Speech synthesis has made significant strides thanks to the transition from machine learning to deep learning models. Contemporary text-to-speech...
-
A survey of technologies for automatic Dysarthric speech recognition
Speakers with dysarthria often struggle to accurately pronounce words and effectively communicate with others. Automatic speech recognition (ASR) is...
-
Using Pre-trained Models for Code-Switched Speech Recognition
In various regions of the world, people tend to use a mix of multiple languages in day-to-day communication. The multilingual ASR system must... -
Identification of Disfluency Among Children Using Efficient Machine Learning Techniques
Disfluency, which refers to any deviation from the anticipated fluency of spoken language, is a considerable issue affecting a substantial number of...