Search Page | SpringerLink

Speech-to-SQL: toward speech-driven SQL query generation from natural language question

Speech-based inputs have been gaining significant momentum with the popularity of smartphones and tablets in our daily lives, since voice is the most...

Yuanfeng Song, Raymond Chi-Wing Wong, Xuefang Zhao in The VLDB Journal

Article 16 February 2024

Diversity subspace generation based on feature selection for speech emotion recognition

Automatic emotion recognition from speech signals is an important research area. Many speech emotion recognition (SER) methods have been proposed,...

Qing Ye, Yaxin Sun in Multimedia Tools and Applications

Article 17 August 2023

Fusion-s2igan: an efficient and effective single-stage framework for speech-to-image generation

The goal of a speech-to-image transform is to produce a photo-realistic picture directly from a speech signal. Current approaches are based on a...

Zhenxing Zhang, Lambert Schomaker in Neural Computing and Applications

Article Open access 19 March 2024

Co-speech Gesture Generation with Variational Auto Encoder

The research field of generating natural gestures from speech input is called co-speech gesture generation. Co-speech generation methods should...

Shinichi Ka, Koichi Shinoda in MultiMedia Modeling

Conference paper 2024

Shallow Diffusion Motion Model for Talking Face Generation from Speech

Talking face generation is synthesizing a lip synchronized talking face video by inputting an arbitrary face image and audio clips. People naturally...

Xulong Zhang, Jianzong Wang, ... **g **ao in Web and Big Data

Conference paper 2023

Gammatonegram representation for end-to-end dysarthric speech processing tasks: speech recognition, speaker identification, and intelligibility assessment

Dysarthria is a disability that causes a disturbance in the human speech system and reduces the quality and intelligibility of a person’s speech....

Aref Farhadipour, Hadi Veisi in Iran Journal of Computer Science

Article 10 March 2024

Improvement of automatic speech recognition systems utilizing 2D adaptive wavelet transformation applied to recurrence plot of speech trajectories

Spectral-based features, typically used in ASR systems, do not capture the phase information of speech signals. Thus, exploiting new features that do...

Shabnam Firooz, Farshad Almasganj, Yasser Shekofteh in Signal, Image and Video Processing

Article 15 December 2023

CommanderUAP: a practical and transferable universal adversarial attacks on speech recognition models

Most of the adversarial attacks against speech recognition systems focus on specific adversarial perturbations, which are generated by adversaries...

Zheng Sun, **xiao Zhao, ... Lei Ju in Cybersecurity

Article Open access 05 June 2024

Speech waveform reconstruction from speech parameters for an effective text to speech synthesis system using minimum phase harmonic sinusoidal model for Punjabi

Speech processing plays a vital role in current speech communication applications. The major objective of digital speech is transmission of messages...

Navdeep Kaur, Parminder Singh in Multimedia Tools and Applications

Article 25 March 2022

Speech Enhancement with Generative Diffusion Models

Abstract

An alternative approach to speech denoising using generative diffusion models that model the distribution of training data is proposed. In...

O. V. Girfanov, A. G. Shishkin in Automatic Documentation and Mathematical Linguistics

Article 01 October 2023

A multitask co-training framework for improving speech translation by leveraging speech recognition and machine translation tasks

End-to-end speech translation (ST) has attracted substantial attention due to its less error accumulation and lower latency. Based on triplet ST data ...

Yue Zhou, Yuxuan Yuan, **aodong Shi in Neural Computing and Applications

Article 27 February 2024

Design and Implementation of Speech Generation and Demonstration Research Based on Deep Learning

Aiming at complex and changeable factors such as speech theme and environment, which make it difficult for a speaker to prepare the speech text in a...

Wanyu Luo, Yanqing Wang, ... Yiqin Xu in Data Science

Conference paper 2023

Audio-guided self-supervised learning for disentangled visual speech representations

In this paper, we propose a novel two-branch framework to learn the disentangled visual speech representations based on two particular observations....

Dalu Feng, Shuang Yang, ... **lin Chen in Frontiers of Computer Science

Article 25 June 2024

Deep Learning-Based Acoustic Feature Representations for Dysarthric Speech Recognition

Dysarthria is a motor speech disorder and the most common neurodegenerative disease characterized by low volume in precise articulation, poor...

M. Latha, M. Shivakumar, ... M. Keerthi Kumar in SN Computer Science

Article 20 March 2023

Detecting Speech Disorders Using A Machine-Learning Guided Method in Spontaneous Tunisian Dialect Speech

This work investigates the disfluencies processing task within the natural spoken language comprehension field. We present a transcription-based...

Emna Boughariou, Younès Bahou, Lamia Hadrich Belguith in SN Computer Science

Article 17 April 2024

Speech based emotion recognition by using a faster region-based convolutional neural network

Automatic emotion identification from speech is a difficult problem that significantly depends on the accuracy of the speech characteristics employed...

Chappidi Suneetha, Raju Anitha in Multimedia Tools and Applications

Article 02 April 2024

WaveVC: Speech and Fundamental Frequency Consistent Raw Audio Voice Conversion

Voice conversion (VC) is a task for changing the speech of a source speaker to the target voice while preserving linguistic information of the source...

Kyungdeuk Ko, Donghyeon Kim, ... Hanseok Ko in Neural Processing Letters

Article Open access 08 May 2024

A novel conversational hierarchical attention network for speech emotion recognition in dyadic conversation

Speech is one of the most fundamental mediums for human-to-human interaction, thereby playing a pivotal role in sha** the landscape of...

Mohammed Tellai, Lijian Gao, ... Mounir Abdelaziz in Multimedia Tools and Applications

Article 29 December 2023

Short Speech Key Generation Technology Based on Deep Learning

With the increasing popularity of biometric identity authentication in important key authentication applications, biometric key generation technology...

Zhengyin Lv, Zhendong Wu, Juan Chen in Machine Learning for Cyber Security

Conference paper 2023

Robust Automatic Speech Recognition Using Wavelet-Based Adaptive Wavelet Thresholding: A Review

Automatic speech recognition (ASR) is one of the most fascinating fields of research and the performance of ASR systems is most promising in a closed...

Mahadevaswamy Shanthamallappa, Kiran Puttegowda, ... Sudheesh Kannur Vasudeva Rao in SN Computer Science

Article 01 February 2024

Search

Filters

Search Results

Search

Navigation