![Loading...](https://link.springer.com/static/c4a417b97a76cc2980e3c25e2271af3129e08bbe/images/pdf-preview/spacer.gif)
-
Article
OV-DAR: Open-Vocabulary Object Detection and Attributes Recognition
In this paper, we endeavor to localize all potential objects in an image and infer their visual categories, attributes, and shapes, even in instances where certain objects have not been encompassed in the mode...
-
Article
OV-VIS: Open-Vocabulary Video Instance Segmentation
Conventionally, the goal of Video Instance Segmentation (VIS) is to segment and categorize objects in videos from a closed set of training categories, lacking the generalization ability to handle novel categor...
-
Chapter and Conference Paper
A Novel Ensemble Approach for Click-Through Rate Prediction Based on Factorization Machines and Gradient Boosting Decision Trees
Click-Through Rate (CTR) prediction is a significant technique in the field of computational advertising, its accuracy directly affects companies profits and user experience. Achieving great ability of general...
-
Chapter and Conference Paper
Spectral Tilt Estimation for Speech Intelligibility Enhancement Using RNN Based on All-Pole Model
Speech intelligibility enhancement is extremely meaningful for successful speech communication in noisy environments. Several methods based on Lombard effect are used to increase intelligibility. In those meth...
-
Chapter and Conference Paper
Head Related Transfer Function Interpolation Based on Aligning Operation
Head related transfer function (HRTF) is the main technique of binaural synthesis, which is used to reconstruct spatial sound image, and the HRTF data only can be obtained by measurement. A high resolution HRT...
-
Chapter and Conference Paper
Research on Perception Sensitivity of Elevation Angle in 3D Sound Field
The development of virtual reality and three-dimensional (3D) video inspired the concern about 3D audio, 3D audio aims at reconstructing the spatial information of original signals, the spatial perception sens...
-
Chapter and Conference Paper
Real-Life Voice Activity Detection Based on Audio-Visual Alignment
Voice activity detection (VAD) is a technology to identify whether the persons in multimedia are speaking. Most of the research efforts focused on utilizing audio and visual information to implement voice acti...
-
Chapter and Conference Paper
Simplification of 3D Multichannel Sound System Based on Multizone Soundfield Reproduction
Home sound environments are becoming increasingly important to the entertainment and audio industries. Compared with single zone soundfield reproduction, 3D spatial multizone soundfield reproduction is a more ...
-
Chapter and Conference Paper
3D Panning Based Sound Field Enhancement Method for Ambisonics
When conventional first order Ambisonics system uses four loudspeaker with platonic solid layout to reconstruct sound field, the 3D acoustic field effect is limited. A new signal distribution method is propose...
-
Chapter and Conference Paper
Multichannel Simplification Based on Deviation of Loudspeaker Positions
People hope to achieve a good impression of three-dimensional (3D) spatial sound with fewer loudspeakers at home. The present method simplified the amount of loudspeakers based on the minimum area enclosed by ...
-
Chapter and Conference Paper
Low Bitrates Audio Bandwidth Extension Using a Deep Auto-Encoder
Modern audio coding technologies apply methods of bandwidth extension (BWE) to efficiently represent audio data at low bitrates. An established method is the well-known spectral band replication (SBR) that can...
-
Chapter and Conference Paper
Physical Properties of Sound Field Based Estimation of Phantom Source in 3D
3D spatial sound effects can be achieved by amplitude panning with several loudspeakers, which can produce the auditory event of phantom source at arbitrary location with loudspeakers at arbitrary locations in...
-
Chapter and Conference Paper
Multi-channel Object-Based Spatial Parameter Compression Approach for 3D Audio
To improve the spatial precision of three-dimensional (3D) audio, the bit rates of spatial parameters are increased sharply. This paper presents a spatial parameters compression approach to decrease the bit ra...
-
Chapter and Conference Paper
Reduction of Multichannel Sound System Based on Spherical Harmonics
In order to meet people’s demand for 3D audio in family, it’s a critical problem to recreate a 3D spatial sound field with few loudspeakers. In this paper, we introduce a L- to (L-1)-channel reduction method base...
-
Chapter and Conference Paper
Automatic Multichannel Simplification with Low Impacts on Sound Pressure at Ears
People hope to use minimum arrangement of loudspeakers to reproduce the experience of the film 3D sound at home. Although the Ando’s conversion can convert n- to m-channel sound system by maintaining the sound pr...