Abstract
Object-based audio techniques provide more flexibility and convenience for personalized rendering under various playback configurations. Many methods have been proposed to encode and transmit multiple audio objects at a low bit-rate. However, the recovered audio objects have severe frequency aliasing distortion, which will destroy the immersive sound quality. This paper describes a new structure to reduce every object’s aliasing distortion. In this method, we extract residual and gain parameters of all objects after N-step operation and use singular value decomposition to compress the residual matrices. The residual matrices can compensate for aliasing distortion in the decoding part. Moreover, we find a proper ordering strategy experimentally to determine the object coding order because it will affect the final decoded quality. From experiment results, the energy sorting strategy is chosen as the best ordering strategy, and the residual information bit-rate can be reduced from 14.11 kbps/per object to 5.87 kbps/per object. Compared with previous studies, our method gets better performance in objective and subjective experiments. The proposed N-step residual compensating structure can reduce every object’s aliasing distortion better than the state-of-the-art methods.
Similar content being viewed by others
References
Ando A (2010) Conversion of multichannel sound signal maintaining physical properties of sound in reproduced sound field. IEEE Trans Audio Speech Lang Process 19(6):1467–1475
Breebaart J, Engdegard J, Falch C, et al (2008) Spatial audio object coding (saoc)-the upcoming mpeg standard on parametric object based audio coding. In: Audio Engineering Society convention, vol 124, pp 1–15
Faller C (2006) Parametric joint-coding of audio sources. In: Audio Engineering Society convention, vol 120, pp 1–12
Fevotte C, Gribonval R, Vincent E (2005) BSS-EVAL toolbox user guide. IRISA, Tech. Rep. 1706. Available: http://www.irisa.fr/metiss/bsseval/
Herre J, Disch S (2007) New concepts in parametric coding of spatial audio: from SAC to SAOC. In: IEEE international conference on multimedia and expo (ICME), pp 1894–1897
ISO/IEC 23003-2:2010 (2010) MPEG-D (MPEG audio technologies), part 2: spatial audio object coding
ISO/IEC 23008-3:2014 (2014) MPEG-H (High efficiency coding and media delivery in hetero-geneous environments), part 3: 3D audio
Jia MS, Yang ZY, Bao CC, Zheng XG, Ritz C (2015) Encoding multiple audio objects using intra-object sparsity. IEEE Trans Audio Speech Lang Process 23(6):1082–1095
Kasuya T, Tsukada M, Komohara Y, Takasaka S, Mizuno T, Nomura Y, Ueda Y, Esaki H (2019) Livration: remote vr live platform with interactive 3d audiovisual service. In: IEEE games, entertainment, media conference (GEM), pp 1–7
Kim C (2014) Object-based spatial audio: concept, advantages, and challenges. In: 3D future internet media. Springer, New York, pp 79–84
Kim K, Seo J, Beack S, Kang K, Hahn M (2011) Spatial audio object coding with two-step coding structure for interactive audio service. IEEE Trans Multimed 13(6):1208–1216
Koo K, Kim K, Seo J, Kang K, Hahn M (2008) Variable subband analysis for high quality spatial audio object coding. In: International conference on advanced communication technology (ICACT), pp 1205–1208
Lathauwer D, Bart D, Joos V (2000) A multilinear singular value decomposition. SIAM J Matrix Anal Appl 21(4):1253–1278
Lee B, Kim K, Hahn M (2016) Efficient residual coding method of spatial audio object coding with two-step coding structure for interactive audio services. IEICE Trans Inf Syst 99(7):1949–1952
Michel D, Jean-Louis D, Thomas F, Gaël R, Olivier LB, Emmanuel V (2012) QUASI database—a musical audio signal database for source separation. Available: http://www.tsi.telecom-paristech.fr/aao/en/2012/03/12/quasi/
Mikami T, Nakahara M, Someya K (2016) Compatibility study of Dolby Atmos objects’ spatial sound localization using a visualization method. In: Audio Engineering Society convention, vol 140, pp 1–4
Oldfield R, Shirley B, Spille J (2015) Object-based audio for interactive football broadcast. Multimed Tools Appl 74(8):2717–2741
Rafii Z, Liutkus A, Fabian-Robert S, Mimilakis S, Bittner R (2017) The MUSDB18 corpus for music separation. Available: https://sigsep.github.io/datasets/musdb.html
Recommendation ITU-R BS.1534-3 (2015) Method for the subjective assessment of intermediate quality levels of coding systems. In: Proceedings of the international telecommunications union, Switzerland
Scheirer E (1999) Structured audio and effects processing in the MPEG-4 multimedia standard. Multimed Syst 7(1):11–22
Scheirer E, Väänänen R, Huopaniemi J (1998) AudioBIFS: The MPEG-4 standard for effects processing. In: Proceedings of the DAFX98 workshop on digital audio effects processing, pp 1–9
Shirley B, Oldfield R (2015) Clean audio for tv broadcast: an object-based approach for hearing impaired viewers. J Audio Eng Soc 63(4):245–256
Vannieuwenhoven N, Vandebril R, Meerbergen K (2012) A new truncation strategy for the higher-order singular value decomposition. SIAM J Sci Comput 34(2):1027–1052
Wall M, Rechtsteiner A, Rocha L (2003) Singular value decomposition and principal component analysis. Springer, Boston, pp 91–109
Wu TZ, Hu RM, Wang XC, Ke SF, Wang JS (2017) High quality audio object coding framework based on non-negative matrix factorization. China Commun 14(9):32–41
Wu T, Hu RM, Wang XC, Ke SF (2019) Audio object coding based on optimal parameter frequency resolution. Multimed Tools Appl 78(15):20723–20738
Zamani S, Rose K (2019) Spatial Audio Coding without Recourse to Background Signal Compression. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 720–724
Zhang S, Wu X H, Qu TS (2019) Sparse autoencoder based multiple audio objects coding method. In: Audio Engineering Society convention, vol 146, pp 1–10
Zheng XG, Ritz C, ** JT (2013) Encoding navigable speech sources: a psychoacoustic-based analysis-by-synthesis approach. IEEE Trans Audio Speech Lang Process 21(1):29–38
Zheng XG, Ritz C, ** JT (2013) A psychoacoustic-based analysis-by-synthesis scheme for jointly encoding multiple audio objects into independent mixtures. In: International conference on acoustics, speech and signal processing (ICASSP), pp 281–285
Acknowledgements
This research is partially supported by the National Key R&D Program of China (No. 2017YFB1002803), National Nature Science Foundation of China (No. 61762005, U1736206), Basic Research Project of Science and Technology Plan of Shenzhen (JCYJ20170818143246278).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hu, C., Wang, X., Hu, R. et al. Audio object coding based on N-step residual compensating. Multimed Tools Appl 80, 18717–18733 (2021). https://doi.org/10.1007/s11042-020-10339-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10339-0