Abstract
In estimation of a normal mean matrix under the matrix quadratic loss, we develop a general formula for the matrix quadratic risk of orthogonally invariant estimators. The derivation is based on several formulas for matrix derivatives of orthogonally invariant functions of matrices. As an application, we calculate the matrix quadratic risk of a singular value shrinkage estimator motivated by Stein’s proposal for improving on the Efron–Morris estimator 50 years ago.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Suppose that we have a matrix observation \(X \in \mathbb {R}^{n \times p}\) whose entries are independent normal random variables \(X_{ij} \sim \textrm{N} (M_{ij},1)\), where \(M \in \mathbb {R}^{n \times p}\) is an unknown mean matrix. In this setting, we consider estimation of M under the matrix quadratic loss (Abu-Shanab et al., 2012; Matsuda and Strawderman, 2022)
which takes a value in the set of \(p \times p\) positive semidefinite matrices. The risk function of an estimator \(\hat{M}=\hat{M}(X)\) is defined as \(R(M,\hat{M}) = \textrm{E}_M [ L ( M,\hat{M}(X) ) ]\), and an estimator \(\hat{M}_1\) is said to dominate another estimator \(\hat{M}_2\) if \(R(M,\hat{M}_1) \preceq R(M,\hat{M}_2)\) for every M, where \(\preceq \) is the Löwner order: \(A \preceq B\) means that \(B-A\) is positive semidefinite. Thus, if \(\hat{M}_1\) dominates \(\hat{M}_2\) under the matrix quadratic loss, then
for every M and \(c \in \mathbb {R}^p\). In particular, each column of \(\hat{M}_1\) dominates that of \(\hat{M}_2\) as an estimator of the corresponding column of M under quadratic loss. Recently, Matsuda and Strawderman (2022) investigated shrinkage estimation in this setting by introducing a concept called matrix superharmonicity, which can be viewed as a generalization of the theory by Stein (1974) for a normal mean vector. Note that shrinkage estimation of a normal mean matrix under the Frobenius loss, which is the trace of the matrix quadratic loss, has been well studied, e.g., (Matsuda and Komaki 2015; Tsukuma 2008; Tsukuma and Kubokawa 2020; Yuasa and Kubokawa 2023a, b; Zheng 1986).
Many common estimators of a normal mean matrix are orthogonally invariant. Namely, they satisfy \(\hat{M}(PXQ) = P \hat{M}(X) Q\) for any orthogonal matrices \(P \in O(n)\) and \(Q \in O(p)\). It can be viewed as a generalization of the rotationally invariance of estimators for a normal mean vector, which is satisfied by many minimax shrinkage estimators (Fourdrinier et al. 2018). We focus on orthogonally invariant estimators given by
where h(X) is an orthogonally invariant function satisfying \(h(PXQ)=h(X)\) for any orthogonal matrices \(P \in O(n)\) and \(Q \in O(p)\) and \(\widetilde{\nabla }\) is the matrix gradient operator defined by (3). For example, the maximum-likelihood estimator \(\hat{M}=X\) corresponds to \(h(X)=0\). The Efron–Morris estimator (Efron and Morris 1972) defined by \(\hat{M}=X(I-(n-p-1)(X^{\top }X)^{-1})\) when \(n-p-1>0\) corresponds to \(h(X)={-(n-p-1)/2} \cdot \log \det (X^{\top } X)\). This estimator can be viewed as a matrix generalization of the James–Stein estimator, and it is minimax under the Frobenius loss (Efron and Morris 1972) as well as matrix quadratic loss (Matsuda and Strawderman 2022). We will provide another example of an orthogonally invariant estimator of the form (2) in Sect. 4. Note that an estimator of the form (2) is called a pseudo-Bayes estimator (Fourdrinier et al. 2018), because it coincides with the (generalized) Bayes estimator when h is given by the logarithm of the marginal distribution of X with respect to some prior on M (Tweedie’s formula).
In this study, to further the theory of shrinkage estimation under the matrix quadratic loss, we develop a general formula for the matrix quadratic risk of orthogonally invariant estimators of the form (2). First, we prepare several matrix derivative formulas in Sect. 2. Then, we derive the formula for the matrix quadratic risk in Sect. 3. Finally, we present an example in Sect. 4, which is motivated by Stein’s proposal for improving on the Efron–Morris estimator 50 years ago (Stein 1974).
2 Matrix derivative formulas
Here, we develop matrix derivative formulas based on Stein (1974). Note that, whereas Stein (1974) considered a setting where X is a \(p \times n\) matrix, here, we take X to be a \(n \times p\) matrix. In the following, the subscripts a, b, \(\ldots \) run from 1 to n and the subscripts i, j, \(\ldots \) run from 1 to p. We denote the Kronecker delta by \(\delta _{ij}\).
We employ the following notations for matrix derivatives introduced in Matsuda and Strawderman (2022).
Definition 1
For a function \(f: \mathbb {R}^{n \times p} \rightarrow \mathbb {R}\), its matrix gradient \(\widetilde{\nabla } f: \mathbb {R}^{n \times p} \rightarrow \mathbb {R}^{n \times p}\) is defined as
Definition 2
For a \(C^2\) function \(f: \mathbb {R}^{n \times p} \rightarrow \mathbb {R}\), its matrix Laplacian \(\widetilde{\Delta } f: \mathbb {R}^{n \times p} \rightarrow \mathbb {R}^{p \times p}\) is defined as
Let
be a spectral decomposition of \(X^{\top } X\), where \(V=(v_1,\dots ,v_p)\) is an orthogonal matrix and \(\Lambda = \textrm{diag}(\lambda _1,\dots ,\lambda _p)\) is a diagonal matrix. Then, the derivatives of \(\lambda \) and V are obtained as follows.
Lemma 1
The derivative of \(\lambda _i\) is
Thus
where \(v_i\) is the i-th column vector of V.
Proof
By differentiating \(V^{\top } V = I_p\) and using \((\textrm{d} V)^{\top } V=(V^{\top } \textrm{d} V)^{\top }\), we obtain
which means the antisymmetricity of \(V^{\top } \textrm{d} V\).
Taking the differential of (5), we have
Then, multiplying (9) on the left by \(V^{\top }\) and on the right by V, we obtain
Since \(\Lambda \) and \(\textrm{d} \Lambda \) are diagonal and \((\textrm{d} V)^{\top } V=(V^{\top } \textrm{d} V)^{\top }\), the (i, j)th entry of (10) yields
Since \((V^{\top } \textrm{d} V)_{ji}=-(V^{\top } \textrm{d} V)_{ij}\) from (8), we obtain
On the other hand, from \(\textrm{d} (X^{\top } X) = (\textrm{d} X)^{\top } X + X^{\top } \textrm{d} X\)
By taking \(i=j\) in (11)
Then, using (12)
Thus, we obtain (6) and it leads to (7). \(\square \)
Lemma 2
The derivative of \(V_{ij}\) is
Proof
From (8), we have \((V^{\top } \textrm{d} V)_{ii}=0\). Also, from (11)
for \(i \ne j\). Therefore
Then, using (12)
where we switched k and l in the last step. Thus, we obtain (13). \(\square \)
A function h is said to be orthogonally invariant if it satisfies \(h(PXQ)=h(X)\) for any orthogonal matrices \(P \in O(n)\) and \(Q \in O(p)\). Such a function can be written as \(h(X)=H(\lambda )\), where \(\lambda \) is the eigenvalues of \(X^{\top } X\) as given by (5), and its derivatives are calculated as follows.
Lemma 3
The matrix gradient (3) of an orthogonally invariant function \(h(X)=H(\lambda )\) is
Thus
where D is the \(p \times p\) diagonal matrix given by
Proof
From (7)
which yields (14). Then, using \(X^{\top }X = V \Lambda V^{\top }\) and \(V^{\top } V = I_p\)
which yields (15). \(\square \)
Lemma 4
The matrix Laplacian (4) of an orthogonally invariant function \(h(X)=H(\lambda )\) is
where D is the \(p \times p\) diagonal matrix given by
Proof
From (14)
Also, from (13)
By substituting (18) into (17) and taking the sum
where we used \(X^{\top }X = V \Lambda V^{\top }\) and
Then
where we used
Thus, by rewriting m to l, we obtain (16). \(\square \)
By taking the trace of the matrix Laplacian (16), we have
where we used
This coincides with the Laplacian formula in Stein (1974).
3 Risk formula
Now, we derive a general formula for the matrix quadratic risk of orthogonally invariant estimators of the form (2).
Theorem 5
Let \(h(X)=H(\lambda )\) be an orthogonally invariant function. Then, the matrix quadratic risk of an estimator \(\hat{M}=X+\widetilde{\nabla }h(X)\) is given by
where D is the \(p \times p\) diagonal matrix given by
Proof
From Matsuda and Strawderman (2022), the matrix quadratic risk of an estimator \(\hat{M}=X+g(X)\) with a weakly differentiable function g is
where the matrix divergence \(\widetilde{\textrm{div}} \ g: \mathbb {R}^{n \times p} \rightarrow \mathbb {R}^{p \times p}\) of a function \(g: \mathbb {R}^{n \times p} \rightarrow \mathbb {R}^{n \times p}\) is defined as
Therefore, by substituting \(g(X)=\widetilde{\nabla }h(X)\) and using \(\widetilde{\textrm{div}} \circ \widetilde{\nabla }=\widetilde{\Delta }\)
Thus, using (15) and (16), we obtain (20). \(\square \)
By taking the trace of (20) and using (19), we obtain the following formula for the Frobenius risk of orthogonally invariant estimators, which coincides with the one given by Stein (1974).
Corollary 6
Let \(h(X)=H(\lambda )\) be an orthogonally invariant function. Then, the Frobenius risk of an estimator \(\hat{M}=X+\widetilde{\nabla }h(X)\) is given by
We derived the risk formula for orthogonally invariant estimators of the form (2), which are called pseudo-Bayes estimators (Fourdrinier et al. 2018). The class of pseudo-Bayes estimators includes all Bayes and generalized Bayes estimators. It is an interesting future work to extend the current result to general orthogonally invariant estimators. Also, extension to unknown covariance case is an important future problem. Note that Section 6.6.2 of Tsukuma and Kubokawa (2020) derived a risk formula for a class of estimators in the unknown covariance setting.
4 Example
We provide an example of the application of Theorem 5. Let \(X = U \Sigma V^{\top }\) with \(U \in \mathbb {R}^{n \times p}\), \(\Sigma = \textrm{diag} (\sigma _1, \ldots , \sigma _p)\) and \(V \in \mathbb {R}^{p \times p}\) be a singular value decomposition of X, where \(U^{\top } U = V^{\top } V = I_p\) and \(\sigma _1 \ge \cdots \ge \sigma _p \ge 0\) are the singular values of X. We consider an orthogonally invariant estimator given by
where \(c_1,\dots ,c_p \ge 0\).
Lemma 7
The estimator (22) can be written in the form (2) with
where \(\lambda _1,\dots ,\lambda _p\) are the eigenvalues of \(X^{\top } X\), as shown in (5).
Proof
From (7)
Thus
where we used \(X=U \Sigma V^{\top }\) and \(\lambda _k=\sigma _k^2\). Therefore, the estimator (22) is written as \(\hat{M}=X+\widetilde{\nabla } h(X)\). \(\square \)
Theorem 8
The matrix quadratic risk of the estimator (22) is given by
where D is the \(p \times p\) diagonal matrix given by
Proof
To apply Theorem 5, let
We have
Thus
Therefore, we obtain (23) from Theorem 5. \(\square \)
The Efron–Morris estimator (Efron and Morris 1972) corresponds to (22) with \(c_k \equiv n-p-1\). In this case
Thus, its matrix quadratic risk (23) is
This coincides with the result in Matsuda and Strawderman (2022).
Motivated by Stein’s proposal (Stein 1974) for improving on the Efron–Morris estimator, we consider the estimator (22) with \(c_k = n+p-2k-1\). In the following, we call it “Stein’s estimator” for convenience. Stein (1974) stated that the positive part of Stein’s estimator dominates the positive part of the Efron–Morris estimator under the Frobenius lossFootnote 1, where “positive-part” means the modification of (22) given by
where \((a)_+=\max (0,a)\). It is known that the estimator (22) is dominated by its positive part (25) under the Frobenius loss (Tsukuma 2008).
Proposition 9
The matrix quadratic risk of Stein’s estimator (estimator (22) with \(c_k = n+p-2k-1\)) is given by
where D is the \(p \times p\) diagonal matrix given by
Thus, Stein’s estimator dominates the maximum-likelihood estimator under the matrix quadratic loss when \(n \ge 3p-1\).
Proof
By substituting \(c_k=n+p-2k-1\) into Theorem 8
The second term is nonpositive, since \(\lambda _1 \ge \lambda _2 \ge \dots \ge \lambda _p\). When \(n \ge 3p-1\), the first term is also nonpositive, and thus
\(\square \)
Numerical results indicate that the bound of n in Proposition 9 may be relaxed to \(n \ge p+2\), which is the same bound with the Efron–Morris estimator. See Appendix.
Finally, we present simulation results to compare Stein’s estimator and the Efron–Morris estimator.
Figure 1 compares the Frobenius risk of Stein’s estimator and the Efron–Morris estimator when \(n=10\) and \(p=3\). It implies that Stein’s estimator dominates the Efron–Morris estimator under the Frobenius loss. Both estimators attain constant risk reduction when some singular values of M are small, regardless of the magnitude of the other singular values. Thus, both estimators work well for low rank matrices. See Matsuda and Strawderman (2022) for related discussions.
Eigenvalues of the matrix quadratic risk of the Efron–Morris estimator (dashed) and Stein’s estimator (solid) for \(n=10\) and \(p=3\). Left: \(\sigma _2(M)=\sigma _3(M)=0\). Right: \(\sigma _1(M)=20\), \(\sigma _3(M)=0\). In the left panel, the second and third eigenvalues of each estimator almost overlap. In the right panel, the first eigenvalues of two estimators almost overlap
Eigenvalues of the matrix quadratic risk of the positive-part Efron–Morris estimator (dashed) and positive-part Stein’s estimator (solid) for \(n=10\) and \(p=3\). Left: \(\sigma _2(M)=\sigma _3(M)=0\). Right: \(\sigma _1(M)=20\), \(\sigma _3(M)=0\). In the left panel, the second and third eigenvalues of each estimator almost overlap. In the right panel, the first eigenvalues of two estimators almost overlap
Figure 2 plots the three eigenvalues \(\lambda _1 \ge \lambda _2 \ge \lambda _3\) of the matrix quadratic risk of Stein’s estimator and the Efron–Morris estimator in the same setting with Fig. 1. Since all eigenvalues are less than \(n=10\), the matrix quadratic risk \(R(M,\hat{M})\) satisfies \(R(M,\hat{M}) \preceq n I_p\) for every M. Thus, both estimators dominate the maximum-likelihood estimator under the matrix quadratic loss, which is compatible with (24) and Proposition 9. Also, each eigenvalue for Stein’s estimator is smaller than the corresponding one for the Efron–Morris estimator, which suggests that Stein’s estimator dominates the Efron–Morris estimator even under the matrix quadratic loss. It is an interesting future work to develop its rigorous theory.
Figures 3 and 4 present the results for the positive-part estimators in the same settings with Figs. 1 and 2, respectively. They show qualitatively the same behavior.
Notes
page 31 of Stein (1974): “It is not difficult to verify, and follows from the general formula (14) that the estimate (8) is better than the crude Efron–Morris estimate (9)”. However, we could not find its proof. It is an interesting future work to fill in this gap.
References
Abu-Shanab, R., Kent, J. T., & Strawderman, W. E. (2012). Shrinkage estimation with a matrix loss function. Electronic Journal of Statistics, 6, 2347–2355.
Efron, B., & Morris, C. (1972). Empirical Bayes on vector observations: An extension of Stein’s method. Biometrika, 59, 335–347.
Fourdrinier, D., Strawderman, W. E., & Wells, M. (2018). Shrinkage estimation. Springer.
Matsuda, T., & Komaki, F. (2015). Singular value shrinkage priors for Bayesian prediction. Biometrika, 102, 843–854.
Matsuda, T., & Strawderman, W. E. (2022). Estimation under matrix quadratic loss and matrix superharmonicity. Biometrika, 109, 503–519.
Stein, C. (1974). Estimation of the mean of a multivariate normal distribution. Proceedings of the Prague Symposium on Asymptotic Statistics, 2, 345–381.
Tsukuma, H. (2008). Admissibility and minimaxity of Bayes estimators for a normal mean matrix. Journal of Multivariate Analysis, 99, 2251–2264.
Tsukuma, H., & Kubokawa, T. (2020). Shrinkage estimation for mean and covariance matrices. Springer.
Yuasa, R., & Kubokawa, T. (2023). Generalized Bayes estimators with closed forms for the normal mean and covariance matrices. Journal of Statistical Planning and Inference, 222, 182–194.
Yuasa, R., & Kubokawa, T. (2023). Weighted shrinkage estimators of normal mean matrices and dominance properties. Journal of Multivariate Analysis, 194, 105138.
Zheng, Z. (1986). On estimation of matrix of normal mean. Journal of Multivariate Analysis, 18, 70–82.
Acknowledgements
The author would like to thank the reviewer for constructive comments. The author would also like to thank William Strawderman for helpful comments. This work was supported by JSPS KAKENHI under Grant Nos. 21H05205, 22K17865 and JST Moonshot under Grant No. JPMJMS2024.
Funding
Open access funding provided by The University of Tokyo.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, Takeru Matsuda states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Matsuda, T. Matrix quadratic risk of orthogonally invariant estimators for a normal mean matrix. Jpn J Stat Data Sci 7, 313–328 (2024). https://doi.org/10.1007/s42081-023-00216-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42081-023-00216-z