Log in

Tucker-3 decomposition with sparse core array using a penalty function based on Gini-index

  • Original Paper
  • Published:
Japanese Journal of Statistics and Data Science Aims and scope Submit manuscript

Abstract

Tucker-3 decomposition is a dimension reduction method for tensor data, similar to principal component analysis. One of the characteristics of Tucker-3 is the core array, which represents the interactions between low-dimensional spaces. However, it is difficult to interpret the result when the number of elements in the core array is large. One solution to this problem is using sparse estimation, such as the L1 regularization method, for the core array. However, some regularization methods often sacrifice the model fit too much. To solve this issue, we propose a novel estimation method for Tucker-3 decomposition with a penalty function based on the Gini index, which is a measure of sparsity and variance. Maximizing the Gini index is expected to obtain an estimated value for the core array that is easy to interpret. Moreover, the model fitted to the data will not involve shrinkage much because the Gini index is a measure of variance, which is one of the model fit measures of Tucker-3. The nonconvex problem poses a challenge when using the proposed penalty function based on the Gini index. To address this problem, we develop a majorization–minimization algorithm. From a numerical example, we revealed that the performance (the precision and accurate prediction of zero cells) of our method is superior to that of the estimation method with existing penalties, such as the L1 penalty, smoothly clipped absolute deviation, and minimax concave penalty.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Canada)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Adachi, K. (2020). Matrix-based introduction to multivariate data analysis (2nd ed.). Springer Singapore.

  • Allen, G. (2012). Sparse higher-order principal components analysis. In Proceedings of the 15th international conference on artificial intelligence and statistics (pp. 27–35).

  • Borg, I., & Groenen, P. J. (2005). Modern multidimensional scaling: Theory and applications. Springer.

  • Candès, E. J., Wakin, M. B., & Boyd, S. P. (2008). Enhancing sparsity by reweighted \(\ell \) 1 minimization. Journal of Fourier Analysis and Applications, 14, 877–905.

    Article  MathSciNet  MATH  Google Scholar 

  • De Lathauwer, L., De Moor, B., & Vandewalle, J. (2000). A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 21, 1253–1278.

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of American Statistical Association, 96, 1348–1360.

    Article  MathSciNet  MATH  Google Scholar 

  • Harshman, R. (1970). Foundations of the parafac procedure: Models and conditions for an “explanatory” multi-modal factor analysis. UCLA Working Paper in Phonetics, 16, 1–84.

  • Hurley, N., & Rickard, S. (2009). Comparing measures of sparsity. IEEE Transactions on Information Theory, 55, 4723–4741.

    Article  MathSciNet  MATH  Google Scholar 

  • Ikemoto, H., & Adachi, K. (2016). Sparse tucker2 analysis of three-way data subject to a constrained number of zero elements in a core array. Computational Statistics and Data Analysis, 98, 1–18.

    Article  MathSciNet  MATH  Google Scholar 

  • Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187–200.

    Article  MATH  Google Scholar 

  • Kiers, H. A. L. (1998). Three-way simplimax for oblique rotation of the three-mode factor analysis core to simple structure. Computational Statistics and Data Analysis, 28, 307–324.

    Article  MATH  Google Scholar 

  • Kiers, H. A. L. (2000). Towards a standardized notation and terminology in multiway analysis. Journal of Chemometrics, 14, 105–122.

    Article  Google Scholar 

  • Kojima, H. (1975). Inter-battery factor analysis of parents’ and childrens’s reports of parental behavior. Japanese Psychological Research, 17, 33–48.

    Article  Google Scholar 

  • Kroonenberg, P. M. (1983). Three-mode principal component analysis. DSWO Press.

  • Kroonenberg, P. M. (2008). Applied multiway data analysis. Wiley.

    Book  MATH  Google Scholar 

  • Li, G. (2020). Generalized co-clustering analysis via regularized aliternating least squares. Computational Statistics and Data Analysis, 150, 106989.

    Article  MathSciNet  MATH  Google Scholar 

  • Liu, Y., Song, R., Lu, W., & **ao, Y. (2021). A probit tensor factorization model for relational learning. Journal of Computational and Graphical Statistics. https://doi.org/10.1080/10618600.2021.2003204.

    Article  Google Scholar 

  • Lundy, M. E., Harshman, R. A., & Kruskal, J. B. (1989). A two-stage procedure incorporating good feature of both trilinear and quadriliear models. In Multiway data analysis (pp. 123–130). Elsevier.

  • Murakami, T., Ten Berge, J. M., & Kiers, H. A. (1998). A case of extreme simplicity of the core matrix in three-mode principal component analysis. Psychometrika, 63, 255–261.

    Article  MATH  Google Scholar 

  • Neuhaus, J. O., & Wrigley, C. F. (1954). The quartimax method: An analytic approach to orthogonal simple structure. British Journal of Statistical Psychology, 7, 81–91.

    Article  Google Scholar 

  • Phan, A. H., & Cichocki, A. (2010). Tensor decompositions for feature extraction and classification of high dimensional datasets. Nonlinear Theory and Its Application, 1, 37–68.

    Article  Google Scholar 

  • Sass, D., & Schmitt, T. (2010). A comparative investigation of rotation criteria within exploratory factor analysis. Multivariate Behavioral Research, 45, 73–103.

    Article  Google Scholar 

  • Sun, W. W., & Cheng, G. (2017). Provable sparse tensor decomposition. Journal of the Royal Statistics Society Series B, 79, 899–916.

    Article  MathSciNet  MATH  Google Scholar 

  • Ten Berge, J. M. F., & Kiers, H. A. L. (1999). Simplicity of core arrays in three-way principal component analysis and the typical rank of \(p \times q \times 2\) arrays. Linear Algebra and its Applications, 294, 169–179.

    Article  MathSciNet  MATH  Google Scholar 

  • Thibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of Royal Statistics Society Series B, 58, 267–288.

    MathSciNet  Google Scholar 

  • Thurstone, L. L. (1947). Multiple-factor analysis. University Chicago Press.

  • Tucker, L. (1966). Some mathematical notes on three-mode factor analysis. Psychometrika, 31, 279–281.

    Article  MathSciNet  Google Scholar 

  • Zhang, A., & Han, R. (2019). Optimal sparse singular value decomposition for high-dimensional high-order data. Journal of American Statistical Association, 114, 1708–1725.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang, C. H. (2010). Nearly unbiased variable selection under minmax concave penalty. The Annuls of Statistics, 38, 894–942.

    MATH  Google Scholar 

  • Zonoobi, D., Kassim, A. A., & Venkatesh, Y. V. (2011). Gini index as sparsity measure for signal reconstruction from compressive samples. IEEE Journal of Selected Topics in Signal Processing, 5, 1–13.

    Article  Google Scholar 

  • Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15, 265–286.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank to the associate editor and two anonymous referees for their constructive comments, which lead to significant improvement of this article. This work was supported by JSPS KAKENHI Grant No. JP19K20226.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Tsuchida.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 41 KB)

Appendix A: Deriving the updated formula of the core array

Appendix A: Deriving the updated formula of the core array

Let \(\varvec{a} \in {\mathbb {R}}^{p}\) be the parameter vector and \(\varvec{x}\in {\mathbb {R}}^{p}\) be the constant vector. We consider minimizing the objective function f defined as follows:

$$\begin{aligned}&f(\varvec{a}|\varvec{x},\lambda , \alpha , \{w_i \mid i = 1, \,2,\, \dotsc ,\, p\},\{\beta _i \mid i = 1, \,2,\, \dotsc ,\, p\})\\&\quad = \dfrac{1}{2}\Vert \varvec{x}-\varvec{a}\Vert ^2_2 + \lambda \left( \dfrac{1}{\alpha }\sum _{i=1}^{p} w_i{|a_{i}|}-\sum _{i=1}^p \beta _i \log ({|a_i|}+\epsilon )\right) \\&\quad =\sum _{i=1}^{p}\left( \dfrac{1}{2}( x_i -a_i\right) ^2 + \lambda \left( \dfrac{1}{\alpha } w_i{|a_{i}|}-\beta _i \log ({|a_i|}+\epsilon )) \right) , \end{aligned}$$

where \(\alpha> 0,w_i \ge 0, \beta _i \in [0,1], \epsilon >0\) is constant. For minimizing f, we minimize each \(( ( x_i -a_i)^2/2 + \lambda (\dfrac{1}{\alpha } w_i{|a_{i}|}-\beta _i \log ({|a_i|}+\epsilon )) )\). We set \(g(a_i|x_i) =( ( x_i -a_i)^2/2 + \lambda (\dfrac{1}{\alpha } w_i{|a_{i}|}-\beta _i \log ({|a_i|}+\epsilon )) )\). The second term of \(g(a_i|x_i) \) is independent the sign of \(a_i\). Thus, the sign of the optimal solution of \(g(a_i|x_i) \) is the same as \(x_i\) when \(x_i \ne 0\) because \(-2x_ia_i \le -2x_i c\)under the same sign of x and \(a_i\) and \(|a_i| = |c| \).

For the case \(x_i >0\), the candidate set of optimal solution of g is \(a_i>0\). \(g(a_i|x_i)\) is the convex function if \(a_i>0\). When setting the derivative of \(g(a_i|x_i)\) as 0, we obtain the equation as follows:

$$\begin{aligned}&a_i -x + \lambda \left( \dfrac{w_i}{\alpha }-\dfrac{\beta _i}{\left( a_i + \epsilon \right) }\right) = 0 \\&\quad \Longleftrightarrow (a_i-x)(a_i + \epsilon ) + \lambda \dfrac{w_i}{\alpha } (a_i + \epsilon ) -\lambda \beta _i= 0\\&\quad \Longleftrightarrow a^2_i+ \left( -x +\epsilon + \lambda \dfrac{w_i }{\alpha }\right) a_i + \left( -x \epsilon +\lambda \left( \dfrac{w_i \epsilon - \beta _i\alpha }{\alpha }\right) \right) = 0. \end{aligned}$$

In the case \(a_i>0\), the candidates of extremal value of g are obtained as follows:

$$\begin{aligned} a_i&= \dfrac{1}{2}\left( x- \left( \epsilon + \lambda \dfrac{w_i }{\alpha }\right) \pm \sqrt{\left( x+\epsilon -\lambda \dfrac{w_i }{\alpha }\right) ^2 +4 \lambda \beta _i}\right) . \end{aligned}$$

Because \(x- (\epsilon + \lambda {w_i }/{\alpha }) <\sqrt{(x+\epsilon -\lambda {w_i }/{\alpha })^2 +4 \lambda \beta _i}\) holds, in the case \(a_i>0\), the candidate of extremal value of g is

$$\begin{aligned} a_i&= \dfrac{1}{2}\max \left\{ 0,\,{x- \left( \epsilon + \lambda {w_i }{\alpha }\right) + \sqrt{\left( x+\epsilon -\lambda \dfrac{w_i }{\alpha }\right) ^2 +4 \lambda \beta _i}} \right\} . \end{aligned}$$

For the case \(x_i <0\), by the same way as the case \(x_i>0\), the candidate of extremal value of g is obtained as follows:

$$\begin{aligned} a_i&= \dfrac{1}{2}\left( x +\left( \epsilon + \lambda \dfrac{w_i }{\alpha }\right) \pm \sqrt{\left( x -\epsilon +\lambda \dfrac{w_i }{\alpha }\right) ^2 +4 \lambda \beta _i}\right) . \end{aligned}$$

\(x + (\epsilon + \lambda w_i /\alpha ) +\sqrt{(x -\epsilon +\lambda {w_i }/{\alpha })^2 +4 \lambda \beta _i}\) is also positive because \(x + (\epsilon + \lambda w_i /\alpha ) + |x-\epsilon +\lambda {w_i }/{\alpha }|\) is a positive value. Thus, in the case \(a_i<0\), the candidate of extremal value of g as

$$\begin{aligned} a_i&= \dfrac{1}{2}\min \left\{ 0,{x + (\epsilon + \lambda {w_i }{\alpha }) - \sqrt{\left( x -\epsilon +\lambda \dfrac{w_i }{\alpha }\right) ^2 +4 \lambda \beta _i}}\right\} . \end{aligned}$$

If \(x_i=0\), the optimal points of \(g(a_i|x_i)\) are

$$\begin{aligned} \dfrac{ \left( \epsilon + \lambda \dfrac{w_i }{\alpha }\right) - \sqrt{\left( \epsilon -\lambda \dfrac{w_i }{\alpha }\right) ^2 +4 \lambda \beta _i}}{2} \mathrm {and} \dfrac{ -\left( \epsilon + \lambda \dfrac{w_i }{\alpha }\right) + \sqrt{\left( -\epsilon +\lambda \dfrac{w_i }{\alpha }\right) ^2 +4 \lambda \beta _i}}{2}. \end{aligned}$$

Thus, we obtain the optimal points of \(g(a_i|x_i)\) as

$$\begin{aligned} a_i = {\left\{ \begin{array}{ll} \dfrac{1}{2}\min \left\{ 0,{x + \left( \epsilon + \lambda \dfrac{w_i }{\alpha }\right) - \sqrt{\left( x -\epsilon +\lambda \dfrac{w_i }{\alpha }\right) ^2 +4 \lambda \beta _i}}\right\} &{}(x<0)\\ \\ \dfrac{1}{2}\max \left\{ 0,\,{x- \left( \epsilon + \lambda \dfrac{w_i }{\alpha }\right) + \sqrt{\left( x+\epsilon -\lambda \dfrac{w_i }{\alpha }\right) ^2 +4 \lambda \beta _i}}\right\} &{}(x\ge 0) \end{array}\right. }. \end{aligned}$$

Therefore, we obtain the updated formula of \(\varvec{G}_i\) when we set \(\varvec{x} = \varvec{\eta }+\varvec{M}'_g(\mathrm {Vec}(\varvec{X}_1)- \varvec{M}_g\varvec{\eta })\), \(\epsilon = \epsilon /R\), and \(\lambda = \lambda /L\).

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tsuchida, J., Yadohisa, H. Tucker-3 decomposition with sparse core array using a penalty function based on Gini-index. Jpn J Stat Data Sci 5, 675–700 (2022). https://doi.org/10.1007/s42081-022-00179-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42081-022-00179-7

Keywords

Navigation