Search Page | SpringerLink

Speech recognition model design for Sundanese language using WAV2VEC 2.0

Indonesia has a variety of languages, one of which is Sundanese. Sundanese is a regional language from Indonesia that has the potential to become...

Albert Cryssiover, Amalia Zahra in International Journal of Speech Technology

Article 14 March 2024

Wav2vec-AD: Acoustic Unit Discovery Module-Integrated, Self-Supervised Contrastive Pre-training Approach for Speech Recognition

An effective speech recognition model necessitates an ample supply of labeled data for supervised training. However, this proposition poses a...

Yolwas Nurmemet, Lixu Sun, ... Zhixiang Wang in Journal of Shanghai Jiaotong University (Science)

Article 10 May 2024

Improving Automatic Speech Recognition for Non-native English with Transfer Learning and Language Model Decoding

ASR systems designed for native English (L1) usually underperform on non-native English (L2). To address this performance gap, (1) we extend our...

Peter Sullivan, Toshiko Shibano, Muhammad Abdul-Mageed in Analysis and Application of Natural Language and Speech Processing

Chapter 2023

Self-supervised Learning for Speech Emotion Recognition Task Using Audio-visual Features and Distil Hubert Model on BAVED and RAVDESS Databases

Existing pre-trained models like Distil HuBERT excel at uncovering hidden patterns and facilitating accurate recognition across diverse data types,...

Karim Dabbabi, Abdelkarim Mars in Journal of Systems Science and Systems Engineering

Article 29 May 2024

Speech Emotion Recognition Using Global-Aware Cross-Modal Feature Fusion Network

Speech emotion recognition (SER) facilitates better interpersonal communication. Emotion is normally present in conversation in many forms, such as...

Feng Li, Jiusong Luo in Advanced Intelligent Computing Technology and Applications

Conference paper 2023

Decoding speech perception from non-invasive brain recordings

Decoding speech from brain activity is a long-awaited goal in both healthcare and neuroscience. Invasive devices have recently led to major...

Alexandre Défossez, Charlotte Caucheteux, ... Jean-Rémi King in Nature Machine Intelligence

Article Open access 05 October 2023

Automatic Speech Recognition of Finnish-Swedish Dialects: A Comparison of Three Cutting-Edge Technologies

This paper explores the performance of two different automatic speech recognition models for the Finnish-Swedish language. The first model, Whisper...

Leonardo Espinosa-Leal, Kristoffer Kuvaja Adolfsson, Andrey Shcherbakov in Smart Technologies for a Sustainable Future

Conference paper 2024

Multimodal Recommendation Engine for Advertising Using Object Detection and Natural Language Processing

In today's world, there is an explosion in online advertising due to high levels of activity of users online. With this comes the intrinsic issue of...

S. Rajarajeswari, Manas P. Shankar, ... Manish Manohar in Advances in Data-driven Computing and Intelligent Systems

Conference paper 2023

Quality Assurance for Speech Synthesis with ASR

Autoregressive TTS models are still widely used. Due to their stochastic nature, the output may vary from very good to completely unusable from one...

René Peinl, Johannes Wirth in Intelligent Systems and Applications

Conference paper 2023

End-to-end ASR framework for Indian-English accent: using speech CNN-based segmentation

The superiority of Automatic Speech Recognition (ASR) has significantly enhanced over time, with a focus from short utterance circumstances to longer...

Ghayas Ahmed, Aadil Ahmad Lawaye in International Journal of Speech Technology

Article 11 November 2023

Adaptive Keyword Extraction Service for Turkish

Keyword extraction is one of the important tasks of NLP. In this study, we have implemented a fast and competitive keyword extraction model for...

H. Yavuz Erzurumlu, Yusuf Sinan Akgul in Intelligent Systems and Applications

Conference paper 2023

Exploration of Whisper fine-tuning strategies for low-resource ASR

Limited data availability remains a significant challenge for Whisper’s low-resource speech recognition performance, falling short of practical...

Yunpeng Liu, Xukui Yang, Dan Qu in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 01 June 2024

English Pronunciation Correction Service for Hearing-Impaired People: BETTer, Focusing on the Personalized Speech Model

There has been a growing body of research that explores the need for hearing impaired people. However, English Education aids for the...

Seon Hong Park, Hyun ** Park, ... Ill Chul Doo in Advances in Computer Science and Ubiquitous Computing

Conference paper 2023

A Novel Approach to Video Summarization Using AI-GPT and Speech Recognition

In an era where online video data is exploding, there is a growing need for efficient ways to summarize video content. In this paper, a novel...

B. P. Aniruddha Prabhu, Tushar Sharma, ... M. S. Guru Prasad in Data Science and Applications

Conference paper 2024

Cross-Lingual Self-training to Learn Multilingual Representation for Low-Resource Speech Recognition

Representation learning or pre-training has shown promising performance for low-resource speech recognition which suffers from the data shortage....

Zi-Qiang Zhang, Yan Song, ... Li-Rong Dai in Circuits, Systems, and Signal Processing

Article 23 July 2022

ASR Bundestag: A Large-Scale Political Debate Dataset in German

We present ASR Bundestag, a dataset for automatic speech recognition in German, consisting of 610 h of aligned audio-transcript pairs for supervised...

Johannes Wirth, René Peinl in Intelligent Systems and Applications

Conference paper 2024

Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources

Speech synthesis has made significant strides thanks to the transition from machine learning to deep learning models. Contemporary text-to-speech...

Huda Barakat, Oytun Turk, Cenk Demiroglu in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 12 February 2024

A survey of technologies for automatic Dysarthric speech recognition

Speakers with dysarthria often struggle to accurately pronounce words and effectively communicate with others. Automatic speech recognition (ASR) is...

Zhaopeng Qian, Ke**g **ao, Chongchong Yu in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 11 November 2023

Using Pre-trained Models for Code-Switched Speech Recognition

In various regions of the world, people tend to use a mix of multiple languages in day-to-day communication. The multilingual ASR system must...

P. Vasuki, Ujjwaleshwar Srikanth, Vijay Sankarnarayanan in Advances in Data-Driven Computing and Intelligent Systems

Conference paper 2024

Identification of Disfluency Among Children Using Efficient Machine Learning Techniques

Disfluency, which refers to any deviation from the anticipated fluency of spoken language, is a considerable issue affecting a substantial number of...

R. Pallavi Reddy, N. Kalyani, ... B. Tharuni in Proceedings of the 6th International Conference on Communications and Cyber Physical Engineering

Conference paper 2024

Search

Filters

Search Results

Search

Navigation