Abstract
For \(d \ge 2\), let X be a random vector having a Bingham distribution on \({\mathcal {S}}^{d-1}\), the unit sphere centered at the origin in \({\mathbb {R}}^d\), and let \(\Sigma \) denote the symmetric matrix parameter of the distribution. Let \(\Psi (\Sigma )\) be the normalizing constant of the distribution and let \(\nabla \Psi _d(\Sigma )\) be the matrix of first-order partial derivatives of \(\Psi (\Sigma )\) with respect to the entries of \(\Sigma \). We derive complete asymptotic expansions for \(\Psi (\Sigma )\) and \(\nabla \Psi _d(\Sigma )\), as \(d \rightarrow \infty \); these expansions are obtained subject to the growth condition that \(\Vert \Sigma \Vert \), the Frobenius norm of \(\Sigma \), satisfies \(\Vert \Sigma \Vert \le \gamma _0 d^{r/2}\) for all d, where \(\gamma _0 > 0\) and \(r \in [0,1)\). Consequently, we obtain for the covariance matrix of X an asymptotic expansion up to terms of arbitrary degree in \(\Sigma \). Using a range of values of d that have appeared in a variety of applications of high-dimensional spherical data analysis, we tabulate the bounds on the remainder terms in the expansions of \(\Psi (\Sigma )\) and \(\nabla \Psi _d(\Sigma )\) and we demonstrate the rapid convergence of the bounds to zero as r decreases.
Similar content being viewed by others
References
Bagyan A (2015) Central limit theorems for randomly modulated sequences of random vectors with resampling and applications to statistics. Doctoral dissertation, Penn State University, University Park
Bhattacharya A, Bhattacharya R (2012) Nonparametric inference on manifolds: with applications to shape spaces. Cambridge University Press, New York
Bingham C (1974) An antipodally symmetric distribution on the sphere. Ann Stat 2:1201–1225
Bingham C, Chang T, Richards D (1992) Approximating the matrix Fisher and Bingham distributions: applications to spherical regression and Procrustes analysis. J Multivar Anal 41:314–337
Brignell CJ, Dryden IL, Gattone SA, Park B, Leask S, Browne WJ, Flynn S (2010) Surface shape analysis with an application to brain surface asymmetry in schizophrenia. Biostatistics 11(4):609–630
Brombin C, Pesarin F, Salmaso L (2011) Dealing with more variables than the sample size: an application to shape analysis. In: Hunter D, Richards D, Rosenberger J (eds) Nonparametric statistics and mixture models: a festschrift in honor of Thomas P. Hettmansperger. World Scientific Press, Singapore, pp 28–44
Chikuse Y (2003) Statistics on Special Manifolds, vol 174. Lecture Notes in Statistics. Springer, New York
Comtet L (1974) Advanced combinatorics: the art of finite and infinite expansions. D. Reidel, Dordrecht
Dai F, Dorman KS, Dutta S, Maitra R (2021) Exploratory factor analysis of data on a sphere. Preprint ar**v:2111.04940
Dryden IL (2005) Statistical analysis on high-dimensional spheres and shape spaces. Ann Stat 33:1643–1665
Dryden IL, Mardia KV (2016) Statistical shape analysis: with applications in R. Wiley, Chichester
Dwyer PS, MacPhail MS (1948) Symbolic matrix derivatives. Ann Math Stat, 517–534
Fisher NI, Lewis T, Embleton BJJ (1993) Statistical analysis of spherical data. Cambridge University Press, New York
Gross KI, Richards DStP (1987) Special functions of matrix argument. I. Algebraic induction, zonal polynomials, and hypergeometric functions. Trans Amer Math Soc 301:781–811
Hadjicosta E, Richards D (2020) Integral transform methods in goodness-of-fit testing, II: The Wishart distributions. Ann Inst Stat Math 72:1317–1370
Horn RA, Johnson CR (2013) Matrix analysis, 2nd edn. Cambridge University Press, New York
James AT (1964) Distributions of matrix variates and latent roots derived from normal samples. Ann Math Stat 35:475–501
Kume A, Walker SG (2014) On the Bingham distribution with large dimension. J Multivar Anal 124:345–352
Mardia KV, Jupp PE (2009) Directional statistics. Wiley, Chichester
Muirhead RJ (1982) Aspects of multivariate statistical theory. Wiley, New York
Patrangenaru V, Ellingson L (2015) Nonparametric statistics on manifolds and their applications to object data analysis. CRC Press, Boca Raton
Reznick B (1983) Some inequalities for products of power sums. Pac J Math 104:443–463
Richards DStP (1982) Differential operators associated with zonal polynomials. II. Ann Inst Stat Math 34:119–121
Sebastiani P (1996) On the derivatives of matrix powers. SIAM J Matrix Anal Appl 17:640–648
Sra S (2018) Directional statistics in machine learning: a brief review. In: Ley C, Verdebout T (eds) Applied directional statistics: modern methods and case studies. Chapman and Hall/CRC Press, New York, pp 275–292
Acknowledgements
The authors are grateful to the referees and the editors for helpful comments on the initial version of this article.
Funding
No funds, grants, or other support were received for conducting this research. The authors have no relevant financial or non-financial interests to disclose.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A Appendix: Proofs
A Appendix: Proofs
1.1 A.1 The proofs of Proposition 2.1 and Corollary 2.2
Proof of Proposition 2.1
By (2.5),
and by (2.1),
For each vector of indices \((i_1,i_2,\ldots ,i_k)\) such that \(i_1 + 2 i_2 + \cdots + k i_k = k\), we have
Denote by \(\lambda _1,\ldots ,\lambda _d\) the eigenvalues of \(\Sigma \). Then for each \(j=1,\ldots ,k\),
and by substituting this bound into (A.3) we obtain
By a remarkable result of (Reznick 1983, p. 447, eq. (2.12)) we obtain, for each \(j=1,\ldots ,k\) and all d,
Substituting the bound in (A.5) into (A.4), we obtain
On applying (A.6) to (A.2) we obtain
where
We will derive in Lemma A.1 a single-sum expression for the multiple sum \(a_k\), and then we will deduce from that single-sum the inequality
Substituting (A.9) into (A.7) and recalling that \(\Vert \Sigma \Vert \le \gamma _0 d^{r/2}\) for all d, we obtain
Inserting (A.10) into (A.1) we obtain
As we shall show in Lemma A.2, with \(\gamma _1 = \tfrac{1}{2}(\sqrt{3}+1) \simeq 1.366025\),
for all \(d \ge 1\), \(k \ge 1\). Letting \(\gamma _2 = \gamma _1 \gamma _0\) and substituting (A.12) into (A.11), we find that for all d,
By applying the ratio test, we find that the series (A.13) converges absolutely for all d such that \(\gamma _2 d^{-(1-r)/2} < 1\), equivalently, \(d > \gamma _2^{2/(1-r)}\).
Note that although the kth term in (A.13) is \(O(d^{-k(1-r)/2})\), it is not evident that the sum of the resulting series is \(O(d^{-m(1-r)/2})\) as \(d \rightarrow \infty \). To establish that stated convergence rate we apply to (A.13) the Cauchy–Schwarz inequality, obtaining
For \(k \ge m\), it is straightforward that
equivalently,
also that
Therefore
Summing this inequality over all \(k \ge m\) we obtain
Noting that the inequality \(d \ge (2\gamma _2^2)^{1/(1-r)}\) is equivalent to
and applying (A.15) and (A.16) to (A.14) we obtain, for all \(d \ge (2\gamma _2^2)^{1/(1-r)}\), the inequality
This establishes (2.7) and proves that \(R_m(\Sigma ) = O(d^{-m(1-r)/2})\) as \(d \rightarrow \infty \), and the proof of the proposition now is complete. \(\square \)
Proof of Corollary 2.2
By (A.17) with \(m=1\) we have, for all \(d \ge (2\gamma _2^2)^{1/(1-r)}\),
For \(d > (6\gamma _2^2)^{1/(1-r)}\), we use the values of the constants given in (2.6) to determine that the right-hand side of (A.18) is strictly less than 1. Then we apply the geometric series to obtain, for all \(l \ge 2\),
By Proposition 2.1, \(R_l(\Sigma ) = O(d^{-l(1-r)/2})\). On applying (A.17) with \(m=1\) we obtain, for all \(d > (6\gamma _2^2)^{1/(1-r)}\),
By summing this geometric series, we deduce that (A.20) is \(O(d^{-(1-r)})\) as \(d \rightarrow \infty \). Therefore by (A.19), for all \(l \ge 2\) and all \(d > (6\gamma _2^2)^{1/(1-r)}\),
On applying the well-known property,
for \(a,b > 0\), we obtain (2.8). \(\square \)
1.2 A.2 The proof of Proposition 3.2 and Theorem 3.3
Proof of Lemma 3.1
The result (3.2) follows by straightforwardly applying each element of the matrix \(\nabla \) to the function \(\exp \big (\hspace{1.0pt}\textrm{tr}\,(\Sigma H)\big )\).
The formula (3.3) follows from the chain rule:
As for (3.4), that result can be deduced by arguments similar to those given by (Sebastiani 1996, Lemmas 3.1 or 5.1). Also see (Dwyer 1948, p. 528, Section 14) for the derivatives of the trace of powers of square, non-symmetric matrices; calculations similar to theirs can also lead to (3.4).
A succinct derivation of (3.4) is obtained from the Taylor expansion (3.1), as follows. By using the commutativity property of the trace, i.e., \(\hspace{1.0pt}\textrm{tr}\,(\Sigma H) = \hspace{1.0pt}\textrm{tr}\,(H \Sigma )\), we find that
as \(H \rightarrow 0\). Setting \(f(\Sigma ) = \hspace{1.0pt}\textrm{tr}\,(\Sigma ^k)\) in (3.1) and comparing the result with (A.22), we obtain (3.4). \(\square \)
Proof of Proposition 3.2:
By the subadditivity property of the Frobenius norm, we have
For \(k \ge 2\), it follows from (2.1) that
where
is a polynomial in \(\{\hspace{1.0pt}\textrm{tr}\,(\Sigma ),\hspace{1.0pt}\textrm{tr}\,(\Sigma ^2),\ldots ,\hspace{1.0pt}\textrm{tr}\,(\Sigma ^k)\}\). It follows from (A.24) that \(p_k(\Sigma )\) is homogeneous of degree k in \(\Sigma \) and its coefficients do not depend on d; moreover, because of the restriction \(i_1 \le k-2\), d the highest power of \(\hspace{1.0pt}\textrm{tr}\,(\Sigma )\) which can appear in \(p_k(\Sigma )\) is \(k-2\).
On applying to (A.25) the subadditivity property of the Frobenius norm we obtain
By (A.24) and the product rule for derivatives,
By (3.4) and the chain rule,
hence,
which also reveals that \(\nabla p_k(\Sigma )\) is homogeneous of degree \(k-1\). On applying the subadditivity property of the Frobenius norm, we obtain
On applying (A.6) we have, for all d,
By also applying the submultiplicative inequality, \(\Vert \Sigma ^{l-1}\Vert \le \Vert \Sigma \Vert ^{l-1}\), we obtain
and by substituting this bound into (A.27), we obtain
On applying the latter inequality at (A.26), we find that
Setting \((i_1,i_2,\ldots ,i_k) = (1,0,\ldots ,0)\) in (A.6), we obtain
hence
On applying the inequalities \(\Vert \Sigma \Vert \le \gamma _0 d^{r/2}\) and (A.32) we obtain
Therefore
and, by applying the ratio test, we find that the latter series converges for all \(d > \gamma _2^{2/(1-r)}\).
On applying the Cauchy–Schwarz inequality to (A.28), we obtain
For any nonnegative integers a and b, it is elementary that
therefore
Also, for all d such that \(\gamma _2^2 d^{-(1-r)} \le 1/2\), equivalently, \(d \ge (2\gamma _2^2)^{1/(1-r)}\), we have
and then we obtain
Therefore \(\Vert \nabla R_m(\Sigma )\Vert = O(d^{-[1+(m-1)(1-r)]/2})\) as \(d \rightarrow \infty \). \(\square \)
Proof of Theorem 3.3:
Since \(\nabla C_{(0)}(\Sigma ) = 0\) then by the zonal polynomial expansion (2.3) of \(\Psi _d(\Sigma )\),
and, by Proposition 3.2, \(\Vert \nabla R_m(\Sigma )\Vert = O(d^{-[1+(m-1)(1-r)]/2})\) as \(d \rightarrow \infty \).
On applying the asymptotic expansion of \([\Psi _d(\Sigma )]^{-1}\) given in (2.8), and using (A.29), we obtain
Expanding this product, we obtain
On applying Proposition 3.2, we obtain
therefore
Similarly, by Proposition 2.1,
and therefore
The last \(O(\cdot )\) term in (A.30) is
Collecting all \(O(\cdot )\) terms in (A.30), we obtain
On applying (A.21), we obtain (3.7). \(\square \)
1.3 A.3 The proofs of (A.9) and (A.12)
First, we establish the inequality (A.9) for the coefficients \(a_k\) defined in (A.8).
Lemma A.1
For \(k = 0,1,2,\ldots \),
Proof
First, we follow the approach of (Comtet 1974, p. 97, Eq. 2d) to derive an explicit formula for \(a_k\).
Let \(t = (t_1,t_2,t_3,\ldots )\) be a vector of indeterminates, and define
\(k \ge 0\). For an indeterminate u, a formal generating-function for the sequence \(\{a_k(t), k=0,1,2,\ldots \}\) is
Now set \(t_1 = d^{1/2}\) and \(t_j = 1\) for all \(j \ge 2\). Then \(a_k(t)\) reduces to \(a_k\); \(G_t(u)\) reduces to G(u), the formal generating-function for the sequence \(\{a_k\}\); and we obtain
Expanding both of the latter functions in infinite series, we obtain
By comparing the coefficients of \(u^k\) in \(G_t(u)\) and G(u), we obtain the equality in (A.31).
Since \(\big ((d^{1/2}-1)/2\big )^l \le \big ((d^{1/2}-1)/2\big )_l\) then, by applying the well-known convolution identity,
\(v_1, v_2 \in {\mathbb {R}}\), we obtain
The proof of (A.31) now is complete. \(\square \)
Next, we establish (A.12).
Lemma A.2
Let \(\gamma _1 = \tfrac{1}{2}(\sqrt{3}+1)\). Then, for all \(k \ge 1\) and \(d \ge 1\),
Proof
Denote by \(r_k\) the left-hand side of (A.32). Then \(r_1 = d^{-1/2}\) and, for \(j \ge 1\),
We claim that
\(j \ge 1\), and we prove this as follows. Define \(y \equiv y(t) = (t^2+2jt)/(t^2+2j)\), \(t \ge 1\). It is simple to verify that y(t) attains its maximum at \(t_j:= 1 + (2j+1)^{1/2}\) and that
For \(j \ge 1\), it is also straightforward to show that
Consequently,
Multiplying the left- and right-hand sides of the above inequality by \(d^{-1/2}\), we obtain (A.33). \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bagyan, A., Richards, D. Complete asymptotic expansions and the high-dimensional Bingham distributions. TEST 33, 540–563 (2024). https://doi.org/10.1007/s11749-023-00910-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-023-00910-w
Keywords
- Confluent hypergeometric function of matrix argument
- Frobenius norm
- Gradient operator
- Power sum symmetric function
- Zonal polynomial