Search Page | SpringerLink

Adaptive multi-task learning for speech to text translation

End-to-end speech to text translation aims to directly translate speech from one language into text in another, posing a challenging cross-modal task...

**n Feng, Yue Zhao, ... **aona Xu in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 13 July 2024

Fake speech detection using VGGish with attention block

While deep learning technologies have made remarkable progress in generating deepfakes, their misuse has become a well-known concern. As a result,...

Tahira Kanwal, Rabbia Mahum, ... Haseeb Hassan in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 26 June 2024

GLFER-Net: a polyphonic sound source localization and detection network based on global-local feature extraction and recalibration

Polyphonic sound source localization and detection (SSLD) task aims to recognize the categories of sound events, identify their onset and offset...

Mengzhen Ma, Ying Hu, ... Hao Huang in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 26 June 2024

Automatic dysarthria detection and severity level assessment using CWT-layered CNN model

Dysarthria is a speech disorder that affects the ability to communicate due to articulation difficulties. This research proposes a novel method for...

Shaik Sajiha, Kodali Radha, ... Durga Prasad Bavirisetti in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 25 June 2024

Estimating the first and second derivatives of discrete audio data

A new method for estimating the first and second derivatives of discrete audio signals intended to achieve higher computational precision in...

Marcin Lewandowski in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 18 June 2024

MIRACLE—a microphone array impulse response dataset for acoustic learning

This work introduces a large dataset comprising impulse responses of spatially distributed sources within a plane parallel to a planar microphone...

Adam Kujawski, Art J. R. Pelling, Ennes Sarradj in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 18 June 2024

Music time signature detection using ResNet18

Time signature detection is a fundamental task in music information retrieval, aiding in music organization. In recent years, the demand for robust...

Jeremiah Abimbola, Daniel Kostrzewa, Pawel Kasprowski in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 13 June 2024

Exploration of Whisper fine-tuning strategies for low-resource ASR

Limited data availability remains a significant challenge for Whisper’s low-resource speech recognition performance, falling short of practical...

Yunpeng Liu, Xukui Yang, Dan Qu in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 01 June 2024

Optimizing feature fusion for improved zero-shot adaptation in text-to-speech synthesis

In the era of advanced text-to-speech (TTS) systems capable of generating high-fidelity, human-like speech by referring a reference speech, voice...

Zhiyong Chen, Zhiqi Ai, ... Shugong Xu in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 28 May 2024

Towards multidimensional attentive voice tracking—estimating voice state from auditory glimpses with regression neural networks and Monte Carlo sampling

Selective attention is a crucial ability of the auditory system. Computationally, following an auditory object can be illustrated as tracking its...

Joanna Luberadzka, Hendrik Kayser, ... Volker Hohmann in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 22 May 2024

Sampling the user controls in neural modeling of audio devices

This work studies neural modeling of nonlinear parametric audio circuits, focusing on how the diversity of settings of the target device user...

Otto Mikkonen, Alec Wright, Vesa Välimäki in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 20 May 2024

Continuous lipreading based on acoustic temporal alignments

Visual speech recognition (VSR) is a challenging task that has received increasing interest during the last few decades. Current state of the art...

David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 06 May 2024

Mi-Go: tool which uses YouTube as data source for evaluating general-purpose speech recognition machine learning models

This article introduces Mi-Go, a tool aimed at evaluating the performance and adaptability of general-purpose speech recognition machine learning...

Tomasz Wojnar, Jarosław Hryszko, Adam Roman in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 01 May 2024

Exploring the power of pure attention mechanisms in blind room parameter estimation

Dynamic parameterization of acoustic environments has drawn widespread attention in the field of audio processing. Precise representation of local...

Chunxi Wang, Maoshen Jia, ... Wenyu ** in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 24 April 2024

Robust acoustic reflector localization using a modified EM algorithm

In robotics, echolocation has been used to detect acoustic reflectors, e.g., walls, as it aids the robotic platform to navigate in darkness and also...

Usama Saqib, Mads Græsbøll Christensen, Jesper Rindom Jensen in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 18 April 2024

Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement

Speech signals are often distorted by reverberation and noise, with a widely distributed signal-to-noise ratio (SNR). To address this, our study...

Zehua Zhang, Lu Zhang, ... Mingjiang Wang in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 11 April 2024

Correction: DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection

Rabbia Mahum, Aun Irtaza, ... Haseeb Hassan in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 11 April 2024

Multi-rate modulation encoding via unsupervised learning for audio event detection

Technologies in healthcare, smart homes, security, ecology, and entertainment all deploy audio event detection (AED) in order to detect sound events...

Sandeep Reddy Kothinti, Mounya Elhilali in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 01 April 2024

DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection

Spoofed speeches are becoming a big threat to society due to advancements in artificial intelligence techniques. Therefore, there must be an...

Rabbia Mahum, Aun Irtaza, ... Haseeb Hassan in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 01 April 2024

Synthesis of soundfields through irregular loudspeaker arrays based on convolutional neural networks

Most soundfield synthesis approaches deal with extensive and regular loudspeaker arrays, which are often not suitable for home audio systems, due to...

Luca Comanducci, Fabio Antonacci, Augusto Sarti in EURASIP Journal on Audio, Speech, and Music Processing

Article Open access 28 March 2024

Search

Filters

Search Results

Search

Navigation