Audio object coding based on N-step residual compensating

Hu, Chenhao; Wang, **aochen; Hu, Ruimin; Wu, Yulin

doi:10.1007/s11042-020-10339-0

Audio object coding based on N-step residual compensating

Published: 19 February 2021

Volume 80, pages 18717–18733, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Chenhao Hu¹,
**aochen Wang ORCID: orcid.org/0000-0002-1904-2097^1,2,
Ruimin Hu^1,3 &
…
Yulin Wu¹

245 Accesses
4 Citations
Explore all metrics

Abstract

Object-based audio techniques provide more flexibility and convenience for personalized rendering under various playback configurations. Many methods have been proposed to encode and transmit multiple audio objects at a low bit-rate. However, the recovered audio objects have severe frequency aliasing distortion, which will destroy the immersive sound quality. This paper describes a new structure to reduce every object’s aliasing distortion. In this method, we extract residual and gain parameters of all objects after N-step operation and use singular value decomposition to compress the residual matrices. The residual matrices can compensate for aliasing distortion in the decoding part. Moreover, we find a proper ordering strategy experimentally to determine the object coding order because it will affect the final decoded quality. From experiment results, the energy sorting strategy is chosen as the best ordering strategy, and the residual information bit-rate can be reduced from 14.11 kbps/per object to 5.87 kbps/per object. Compared with previous studies, our method gets better performance in objective and subjective experiments. The proposed N-step residual compensating structure can reduce every object’s aliasing distortion better than the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Audio object coding based on optimal parameter frequency resolution

Article 05 March 2019

Multi-step Coding Structure of Spatial Audio Object Coding

Multi-channel Object-Based Spatial Parameter Compression Approach for 3D Audio

References

Ando A (2010) Conversion of multichannel sound signal maintaining physical properties of sound in reproduced sound field. IEEE Trans Audio Speech Lang Process 19(6):1467–1475
Article Google Scholar
Breebaart J, Engdegard J, Falch C, et al (2008) Spatial audio object coding (saoc)-the upcoming mpeg standard on parametric object based audio coding. In: Audio Engineering Society convention, vol 124, pp 1–15
Faller C (2006) Parametric joint-coding of audio sources. In: Audio Engineering Society convention, vol 120, pp 1–12
Fevotte C, Gribonval R, Vincent E (2005) BSS-EVAL toolbox user guide. IRISA, Tech. Rep. 1706. Available: http://www.irisa.fr/metiss/bsseval/
Herre J, Disch S (2007) New concepts in parametric coding of spatial audio: from SAC to SAOC. In: IEEE international conference on multimedia and expo (ICME), pp 1894–1897
ISO/IEC 23003-2:2010 (2010) MPEG-D (MPEG audio technologies), part 2: spatial audio object coding
ISO/IEC 23008-3:2014 (2014) MPEG-H (High efficiency coding and media delivery in hetero-geneous environments), part 3: 3D audio
Jia MS, Yang ZY, Bao CC, Zheng XG, Ritz C (2015) Encoding multiple audio objects using intra-object sparsity. IEEE Trans Audio Speech Lang Process 23(6):1082–1095
Article Google Scholar
Kasuya T, Tsukada M, Komohara Y, Takasaka S, Mizuno T, Nomura Y, Ueda Y, Esaki H (2019) Livration: remote vr live platform with interactive 3d audiovisual service. In: IEEE games, entertainment, media conference (GEM), pp 1–7
Kim C (2014) Object-based spatial audio: concept, advantages, and challenges. In: 3D future internet media. Springer, New York, pp 79–84
Kim K, Seo J, Beack S, Kang K, Hahn M (2011) Spatial audio object coding with two-step coding structure for interactive audio service. IEEE Trans Multimed 13(6):1208–1216
Article Google Scholar
Koo K, Kim K, Seo J, Kang K, Hahn M (2008) Variable subband analysis for high quality spatial audio object coding. In: International conference on advanced communication technology (ICACT), pp 1205–1208
Lathauwer D, Bart D, Joos V (2000) A multilinear singular value decomposition. SIAM J Matrix Anal Appl 21(4):1253–1278
Article MathSciNet Google Scholar
Lee B, Kim K, Hahn M (2016) Efficient residual coding method of spatial audio object coding with two-step coding structure for interactive audio services. IEICE Trans Inf Syst 99(7):1949–1952
Article Google Scholar
Michel D, Jean-Louis D, Thomas F, Gaël R, Olivier LB, Emmanuel V (2012) QUASI database—a musical audio signal database for source separation. Available: http://www.tsi.telecom-paristech.fr/aao/en/2012/03/12/quasi/
Mikami T, Nakahara M, Someya K (2016) Compatibility study of Dolby Atmos objects’ spatial sound localization using a visualization method. In: Audio Engineering Society convention, vol 140, pp 1–4
Oldfield R, Shirley B, Spille J (2015) Object-based audio for interactive football broadcast. Multimed Tools Appl 74(8):2717–2741
Article Google Scholar
Rafii Z, Liutkus A, Fabian-Robert S, Mimilakis S, Bittner R (2017) The MUSDB18 corpus for music separation. Available: https://sigsep.github.io/datasets/musdb.html
Recommendation ITU-R BS.1534-3 (2015) Method for the subjective assessment of intermediate quality levels of coding systems. In: Proceedings of the international telecommunications union, Switzerland
Scheirer E (1999) Structured audio and effects processing in the MPEG-4 multimedia standard. Multimed Syst 7(1):11–22
Article Google Scholar
Scheirer E, Väänänen R, Huopaniemi J (1998) AudioBIFS: The MPEG-4 standard for effects processing. In: Proceedings of the DAFX98 workshop on digital audio effects processing, pp 1–9
Shirley B, Oldfield R (2015) Clean audio for tv broadcast: an object-based approach for hearing impaired viewers. J Audio Eng Soc 63(4):245–256
Article Google Scholar
Vannieuwenhoven N, Vandebril R, Meerbergen K (2012) A new truncation strategy for the higher-order singular value decomposition. SIAM J Sci Comput 34(2):1027–1052
Article MathSciNet Google Scholar
Wall M, Rechtsteiner A, Rocha L (2003) Singular value decomposition and principal component analysis. Springer, Boston, pp 91–109
Google Scholar
Wu TZ, Hu RM, Wang XC, Ke SF, Wang JS (2017) High quality audio object coding framework based on non-negative matrix factorization. China Commun 14(9):32–41
Article Google Scholar
Wu T, Hu RM, Wang XC, Ke SF (2019) Audio object coding based on optimal parameter frequency resolution. Multimed Tools Appl 78(15):20723–20738
Article Google Scholar
Zamani S, Rose K (2019) Spatial Audio Coding without Recourse to Background Signal Compression. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 720–724
Zhang S, Wu X H, Qu TS (2019) Sparse autoencoder based multiple audio objects coding method. In: Audio Engineering Society convention, vol 146, pp 1–10
Zheng XG, Ritz C, ** JT (2013) Encoding navigable speech sources: a psychoacoustic-based analysis-by-synthesis approach. IEEE Trans Audio Speech Lang Process 21(1):29–38
Article Google Scholar
Zheng XG, Ritz C, ** JT (2013) A psychoacoustic-based analysis-by-synthesis scheme for jointly encoding multiple audio objects into independent mixtures. In: International conference on acoustics, speech and signal processing (ICASSP), pp 281–285

Download references

Acknowledgements

This research is partially supported by the National Key R&D Program of China (No. 2017YFB1002803), National Nature Science Foundation of China (No. 61762005, U1736206), Basic Research Project of Science and Technology Plan of Shenzhen (JCYJ20170818143246278).

Author information

Authors and Affiliations

National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, 430072, China
Chenhao Hu, **aochen Wang, Ruimin Hu & Yulin Wu
Research Institute of Wuhan University in Shenzhen, Shenzhen, 518000, China
**aochen Wang
Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan, 430072, China
Ruimin Hu

Authors

Chenhao Hu
View author publications
You can also search for this author in PubMed Google Scholar
**aochen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ruimin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yulin Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to **aochen Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, C., Wang, X., Hu, R. et al. Audio object coding based on N-step residual compensating. Multimed Tools Appl 80, 18717–18733 (2021). https://doi.org/10.1007/s11042-020-10339-0

Download citation

Received: 19 May 2020
Revised: 04 September 2020
Accepted: 22 December 2020
Published: 19 February 2021
Issue Date: May 2021
DOI: https://doi.org/10.1007/s11042-020-10339-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Audio object coding based on N-step residual compensating

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Audio object coding based on optimal parameter frequency resolution

Multi-step Coding Structure of Spatial Audio Object Coding

Multi-channel Object-Based Spatial Parameter Compression Approach for 3D Audio

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Audio object coding based on N-step residual compensating

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Audio object coding based on optimal parameter frequency resolution

Multi-step Coding Structure of Spatial Audio Object Coding

Multi-channel Object-Based Spatial Parameter Compression Approach for 3D Audio

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation