Abstract
Multichannel source separation has been a popular topic, and recently proposed methods based on the local Gaussian model have provided promising results despite its high computational cost when several sensors are used. This drawback limits the practical application of this approach to tasks such as sound field reconstruction or virtual reality. In this study, we presented a numerical approach to reduce the complexity of multichannel nonnegative matrix factorization to address the task of audio source separation for scenarios with a large number of sensors, such as high-order ambisonics encoding. In particular, we proposed a parallel driver to compute the multiplicative update rules in MNMF approaches. It is designed to function on both sequential and multicore computers, as well as graphics processing units (GPUs). The solution attempts to reduce the computational cost of multiplicative update rules by using Cholesky decomposition and solving several triangular equation systems. The driver was evaluated for different scenarios, with promising results in terms of execution times for both the CPU and GPU. To the best of our knowledge, this proposal is the first to address the problem of reducing the computational cost of full-rank MNMF-based systems using parallel and high-performance techniques.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11227-023-05471-1/MediaObjects/11227_2023_5471_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11227-023-05471-1/MediaObjects/11227_2023_5471_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11227-023-05471-1/MediaObjects/11227_2023_5471_Fig3_HTML.png)
Similar content being viewed by others
Data Availability
The data generated in this study are available from the corresponding author upon request.
References
Wien M, Boyce JM, Stockhammer T, Peng WH (2019) Standardization status of immersive video coding. IEEE J Emerg Select Top Circu Syst 9(1):5–17. https://doi.org/10.1109/JETCAS.2019.2898948
Parry RM, Essa I (2006) Estimating the Spatial Position of Spectral Components in Audio. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). p. 666–673. doi: https://doi.org/10.1007/11679363_83
FitzGerald D (2005) Non-negative tensor factorisation for sound source separation. In: IEE Irish Signals and Systems Conference 2005. vol. 2005. IEE. p. 8–12. doi: https://doi.org/10.1049/cp_20050279
Lee S, Park SH, Sung KM (2012) Beamspace-domain multichannel nonnegative matrix factorization for audio source separation. IEEE Signal Process Lett 19(1):43–46. https://doi.org/10.1109/LSP.2011.2173192
Pezzoli M, Carabias-Orti JJ, Cobos M, Antonacci F, Sarti A (2021) Ray-Space-based multichannel nonnegative matrix factorization for audio source separation. IEEE Signal Process Lett 28:369–373. https://doi.org/10.1109/LSP.2021.3055463
Bianchi L, D’Amelio F, Antonacci F, Sarti A, Tubaro S (2015) A plenacoustic approach to acoustic signal extraction. In: 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). p. 1–5
Marković D, Antonacci F, Bianchi L, Tubaro S, Sarti A (2016) Extraction of acoustic sources through the processing of sound field maps in the ray space. IEEE/ACM Trans Audio Speech Lang Process 24(12):2481–2494. https://doi.org/10.1109/TASLP.2016.2615242
Peled Y, Rafaely B (2013) Linearly-constrained minimum-variance method for spherical microphone arrays based on plane-wave decomposition of the sound field. IEEE Trans Audio Speech Langu Process 21(12):2532–2540. https://doi.org/10.1109/TASL.2013.2277939
Mitsufuji Y, Takamune N, Koyama S, Saruwatari H (2021) Multichannel blind source separation based on evanescent-region-aware non-negative tensor factorization in spherical harmonic domain. IEEE/ACM Trans Audio Speech Langu Process 29:607–617. https://doi.org/10.1109/TASLP.2020.3045528
Nikunen J, Virtanen T (2014) Direction of arrival based spatial covariance model for blind sound source separation. IEEE/ACM Trans Audio Speech Langu Process 22(3):727–739. https://doi.org/10.1109/TASLP.2014.2303576
Ozerov A, Févotte C, Vincent E (2018) In: Makino S, editor. An Introduction to Multichannel NMF for Audio Source Separation. Cham: Springer International Publishing. p. 73–94. doi: https://doi.org/10.1007/978-3-319-73031-8_4
Carabias-Orti JJ, Nikunen J, Virtanen T, Vera-Candeas P (2018) Multichannel blind sound source separation using spatial covariance model with level and time differences and nonnegative matrix factorization. IEEE/ACM Trans Audio Speech Langu Process 26(9):1512–1527. https://doi.org/10.1109/TASLP.2018.2830105
Sekiguchi K, Nugraha AA, Bando Y, Yoshii K (2019) Fast Multichannel Source Separation Based on Jointly Diagonalizable Spatial Covariance Matrices. In: 2019 27th European Signal Processing Conference (EUSIPCO). IEEE. p. 1–5. Available from: https://ieeexplore.ieee.org/document/8902557/
Muñoz-Montoro AJ, Carabias-Orti JJ, Cabañas-Molero P, Cañadas-Quesada FJ, Ruiz-Reyes N (2022) Multichannel blind music source separation using directivity-aware mnmf with harmonicity constraints. IEEE Access 10:17781–17795. https://doi.org/10.1109/ACCESS.2022.3150248
Sekiguchi K, Bando Y, Nugraha AA, Fontaine M, Yoshii K (2021) Autoregressive Fast Multichannel Nonnegative Matrix Factorization For Joint Blind Source Separation And Dereverberation. In: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). p. 511–515
Koyama S, Daudet L (2019) Sparse Representation of a spatial sound field in a reverberant environment. IEEE J Select Top Signal Process 13(1):172–184. https://doi.org/10.1109/JSTSP.2019.2901127
Mitsufuji Y, Takamune N, Koyama S, Saruwatari H (2021) Multichannel blind source separation based on evanescent-region-aware non-negative tensor factorization in spherical harmonic domain. IEEE ACM Trans Audio Speech Lang Process 29:607–617. https://doi.org/10.1109/TASLP.2020.3045528
Borra F, Krenn S, Gebru ID, Marković D (2019) 1ST-Order Microphone Array System for Large Area Sound Field Recording and Reconstruction: Discussion and Preliminary Results. In: 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). p. 378–382
Boyd SP, Lieven V (2018) Introduction to applied linear algebra : vectors matrices and least squares. Cambridge University Press, Cambridge UK
Ito N, Nakatani T (2019) FastMNMF: Joint Diagonalization Based Accelerated Algorithms for Multichannel Nonnegative Matrix Factorization. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. p. 371–375. Available from: https://ieeexplore.ieee.org/document/8682291/
Sekiguchi K, Bando Y, Nugraha AA, Yoshii K, Kawahara T (2020) Fast multichannel nonnegative matrix factorization with directivity-aware jointly-diagonalizable spatial covariance matrices for blind source separation. IEEE/ACM Trans Audio Speech Langu Process 28:2610–2625. https://doi.org/10.1109/TASLP.2020.3019181
Mitsufuji Y, Uhlich S, Takamune N, Kitamura D, Koyama S, Saruwatari H (2020) Multichannel non-negative matrix factorization using banded spatial covariance matrices in wavenumber domain. IEEE/ACM Trans Audio Speech Langu Process 28:49–60. https://doi.org/10.1109/TASLP.2019.2948770
Sawada H, Kameoka H, Araki S, Ueda N (2013) Multichannel extensions of non-negative matrix factorization with complex-valued data. IEEE Trans Audio Speech Langu Process 21(5):971–982. https://doi.org/10.1109/TASL.2013.2239990
Nørholm SM, Jensen JR, Christensen MG (2016) Instantaneous fundamental frequency estimation with optimal segmentation for nonstationary voiced speech. IEEE/ACM Trans Audio Speech Langu Process 24(12):2354–2367. https://doi.org/10.1109/TASLP.2016.2608948
Kuklasiński A, Doclo S, Jensen SH, Jensen J (2016) Maximum likelihood PSD estimation for speech enhancement in reverberation and noise. IEEE/ACM Trans Audio Speech Langu Process 24(9):1599–1612. https://doi.org/10.1109/TASLP.2016.2573591
Liutkus A, Badeau R, Richard G (2011) Gaussian processes for underdetermined source separation. IEEE Trans Signal Process 59(7):3155–3167. https://doi.org/10.1109/TSP.2011.2119315
Yoshii K, Tomioka R, Mochihashi D, Goto M (2013) Beyond NMF: Time-Domain Audio Source Separation without Phase Reconstruction. In: ISMIR p. 369–374
Kim J, Park H (2011) Fast nonnegative matrix factorization: an active-set-like method and comparisons. SIAM J Scient Comput 33(6):3261–3281. https://doi.org/10.1137/110821172
Alostad JM (2017) Reducing Dimensionality Using NMF Based Cholesky Decomposition. In: Proceedings of the International Conference on Research in Adaptive and Convergent Systems. RACS ’17. New York, NY, USA: Association for Computing Machinery. p. 49–55. doi: https://doi.org/10.1145/3129676.3129697
Nikunen J, Politis A (2018) Multichannel NMF for Source Separation with Ambisonic Signals. In: 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC). IEEE. p. 251–255. Available from: https://ieeexplore.ieee.org/document/8521344/
Sawada H, Araki S, Makino S (2011) Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment. IEEE Trans Audio Speech Langu Process 19(3):516–527. https://doi.org/10.1109/TASL.2010.2051355
Ferreira Lima JV, Raïs I, Lefevre L, Gautier T (2019) Performance and energy analysis of OpenMP runtime systems with dense linear algebra algorithms. The Int J High Perform Comput Appl 33(3):431–443
Funding
This work was supported by MCIN/AEI/10.13039/501100011033 under project grant PID2020-119082RB-{C21,C22}, and Gobierno del Principado de Asturias under grant AYUD/2021/50994.
Author information
Authors and Affiliations
Contributions
All the authors contributed equally to this study.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflicts of interest.
Ethical Approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Muñoz-Montoro, A.J., Carabias-Orti, J.J., Salvati, D. et al. Efficient parallel kernel based on Cholesky decomposition to accelerate multichannel nonnegative matrix factorization. J Supercomput 79, 20649–20664 (2023). https://doi.org/10.1007/s11227-023-05471-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05471-1