Search
Search Results
-
Adaptive multi-task learning for speech to text translation
End-to-end speech to text translation aims to directly translate speech from one language into text in another, posing a challenging cross-modal task...
-
Fake speech detection using VGGish with attention block
While deep learning technologies have made remarkable progress in generating deepfakes, their misuse has become a well-known concern. As a result,...
-
GLFER-Net: a polyphonic sound source localization and detection network based on global-local feature extraction and recalibration
Polyphonic sound source localization and detection (SSLD) task aims to recognize the categories of sound events, identify their onset and offset...
-
Automatic dysarthria detection and severity level assessment using CWT-layered CNN model
Dysarthria is a speech disorder that affects the ability to communicate due to articulation difficulties. This research proposes a novel method for...
-
Estimating the first and second derivatives of discrete audio data
A new method for estimating the first and second derivatives of discrete audio signals intended to achieve higher computational precision in...
-
MIRACLE—a microphone array impulse response dataset for acoustic learning
This work introduces a large dataset comprising impulse responses of spatially distributed sources within a plane parallel to a planar microphone...
-
Music time signature detection using ResNet18
Time signature detection is a fundamental task in music information retrieval, aiding in music organization. In recent years, the demand for robust...
-
Exploration of Whisper fine-tuning strategies for low-resource ASR
Limited data availability remains a significant challenge for Whisper’s low-resource speech recognition performance, falling short of practical...
-
Optimizing feature fusion for improved zero-shot adaptation in text-to-speech synthesis
In the era of advanced text-to-speech (TTS) systems capable of generating high-fidelity, human-like speech by referring a reference speech, voice...
-
Towards multidimensional attentive voice tracking—estimating voice state from auditory glimpses with regression neural networks and Monte Carlo sampling
Selective attention is a crucial ability of the auditory system. Computationally, following an auditory object can be illustrated as tracking its...
-
Sampling the user controls in neural modeling of audio devices
This work studies neural modeling of nonlinear parametric audio circuits, focusing on how the diversity of settings of the target device user...
-
Continuous lipreading based on acoustic temporal alignments
Visual speech recognition (VSR) is a challenging task that has received increasing interest during the last few decades. Current state of the art...
-
Mi-Go: tool which uses YouTube as data source for evaluating general-purpose speech recognition machine learning models
This article introduces Mi-Go, a tool aimed at evaluating the performance and adaptability of general-purpose speech recognition machine learning...
-
Exploring the power of pure attention mechanisms in blind room parameter estimation
Dynamic parameterization of acoustic environments has drawn widespread attention in the field of audio processing. Precise representation of local...
-
Robust acoustic reflector localization using a modified EM algorithm
In robotics, echolocation has been used to detect acoustic reflectors, e.g., walls, as it aids the robotic platform to navigate in darkness and also...
-
Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement
Speech signals are often distorted by reverberation and noise, with a widely distributed signal-to-noise ratio (SNR). To address this, our study...
-
Multi-rate modulation encoding via unsupervised learning for audio event detection
Technologies in healthcare, smart homes, security, ecology, and entertainment all deploy audio event detection (AED) in order to detect sound events...
-
DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection
Spoofed speeches are becoming a big threat to society due to advancements in artificial intelligence techniques. Therefore, there must be an...
-
Synthesis of soundfields through irregular loudspeaker arrays based on convolutional neural networks
Most soundfield synthesis approaches deal with extensive and regular loudspeaker arrays, which are often not suitable for home audio systems, due to...