Log in

On convergence of covariance matrix of empirical Bayes hyper-parameter estimator

  • Research Article
  • Published:
Control Theory and Technology Aims and scope Submit manuscript

Abstract

Regularized system identification has become the research frontier of system identification in the past decade. One related core subject is to study the convergence properties of various hyper-parameter estimators as the sample size goes to infinity. In this paper, we consider one commonly used hyper-parameter estimator, the empirical Bayes (EB). Its convergence in distribution has been studied, and the explicit expression of the covariance matrix of its limiting distribution has been given. However, what we are truly interested in are factors contained in the covariance matrix of the EB hyper-parameter estimator, and then, the convergence of its covariance matrix to that of its limiting distribution is required. In general, the convergence in distribution of a sequence of random variables does not necessarily guarantee the convergence of its covariance matrix. Thus, the derivation of such convergence is a necessary complement to our theoretical analysis about factors that influence the convergence properties of the EB hyper-parameter estimator. In this paper, we consider the regularized finite impulse response (FIR) model estimation with deterministic inputs, and show that the covariance matrix of the EB hyper-parameter estimator converges to that of its limiting distribution. Moreover, we run numerical simulations to demonstrate the efficacy of our theoretical results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. A sequence of random variables \(\xi _{N}\in {\mathbb {R}}^{d}\) converges in distribution to a random variable \(\xi \in {\mathbb {R}}^{d}\), if \(\lim _{N\rightarrow \infty }\text {Pr}(\xi _{N}\le x)=\text {Pr}(\xi \le x)\) for every x at which the limit distribution function \(\text {Pr}(\xi \le x)\) is continuous, where the map \(x\mapsto \text {Pr}(\xi \le x)\) denotes the distribution function of \(\xi \) and \(\text {Pr}(\cdot )\) is a probability function. It can be written as \(\xi _{N} \overset{d.}{\rightarrow }\xi \).

References

  1. Pillonetto, G., & Nicolao, G. D. (2010). A new kernel-based approach for linear system identification. Automatica, 46(1), 81–93.

    Article  MathSciNet  Google Scholar 

  2. Chen, T., Ohlsson, H., & Ljung, L. (2012). On the estimation of transfer functions, regularizations and gaussian processes—revisited. Automatica, 48(8), 1525–1535.

    Article  MathSciNet  Google Scholar 

  3. Pillonetto, G., Dinuzzo, F., Chen, T., De Nicolao, G., & Ljung, L. (2014). Kernel methods in system identification, machine learning and function estimation: A survey. Automatica, 50(3), 657–682.

    Article  MathSciNet  Google Scholar 

  4. Chiuso, A. (2016). Regularization and Bayesian learning in dynamical systems: Past, present and future. Annual Reviews in Control, 41, 24–38.

    Article  Google Scholar 

  5. Pillonetto, G., Chen, T., Chiuso, A., De Nicolao, G., & Ljung, L. (2022). Regularized system identification: Learning dynamic models from data. Springer.

    Book  Google Scholar 

  6. Ljung, L., Chen, T., & Mu, B. (2020). A shift in paradigm for system identification. International Journal of Control, 93(2), 173–180.

    Article  MathSciNet  Google Scholar 

  7. Chen, T. (2018). On kernel design for regularized LTI system identification. Automatica, 90, 109–122.

    Article  MathSciNet  Google Scholar 

  8. Zorzi, M., & Chiuso, A. (2018). The harmonic analysis of kernel functions. Automatica, 94, 125–137.

    Article  MathSciNet  Google Scholar 

  9. Xu, Y., Mu, B., & Chen, T. (2022). On kernel design for regularized volterra series model with application to wiener system identification. In 2022 41st Chinese control conference (CCC), pp. 1503–1508. https://doi.org/10.23919/CCC55666.2022.9902870

  10. Fang, X., & Chen, T. (2022). On kernel design for non-causal systems with application to feedforward control. In 2022 41st Chinese control conference (CCC), pp. 1523–1528. https://doi.org/10.23919/CCC55666.2022.9902450

  11. Chen, T., & Ljung, L. (2013). Implementation of algorithms for tuning parameters in regularized least squares problems in system identification. Automatica, 49(7), 2213–2220.

    Article  MathSciNet  Google Scholar 

  12. Chen, T., & Andersen, M. S. (2021). On semiseparable kernels and efficient implementation for regularized system identification and function estimation. Automatica, 132, 109682.

    Article  MathSciNet  Google Scholar 

  13. Zhang, J., Ju, Y., Mu, B., Zhong, R., & Chen, T. (2023). An efficient implementation for spatial-temporal Gaussian process regression and its applications. Automatica, 147, 110679.

    Article  MathSciNet  Google Scholar 

  14. Pillonetto, G., & Chiuso, A. (2015). Tuning complexity in regularized kernel-based regression and linear system identification: The robustness of the marginal likelihood estimator. Automatica, 58, 106–117.

    Article  MathSciNet  Google Scholar 

  15. Mu, B., Chen, T., & Ljung, L. (2018). On asymptotic properties of hyperparameter estimators for kernel-based regularization methods. Automatica, 94, 381–395.

    Article  MathSciNet  Google Scholar 

  16. Mu, B., Chen, T., & Ljung, L. (2018). Asymptotic properties of generalized cross validation estimators for regularized system identification. IFAC-PapersOnLine, 51(15), 203–208.

    Article  Google Scholar 

  17. Hjalmarsson, H. Dynamic model learning: A geometric perspective (Lecture Notes in FEL3201/FEL3202). https://www.kth.se/social/files/5bd998d056be5b5554039a31/dml_lecturenotes.pdf

  18. Hjalmarsson, H. (2020). Estimation accuracy of kernel-based estimators (IFAC WC workshop).

  19. Mu, B., Chen, T., & Ljung, L. (2021). On the asymptotic optimality of cross-validation based hyper-parameter estimators for regularized least squares regression problems. ar**v:2104.10471

  20. Ju, Y., Mu, B., Ljung, L., & Chen, T. (2023). Asymptotic theory for regularized system identification part I: Empirical Bayes hyper-parameter estimator. IEEE Transactions on Automatic Control, 68, 7224–7239.

    Article  MathSciNet  Google Scholar 

  21. Aravkin, A., Burke, J., Chiuso, A., & Pillonetto, G. (2012). On the MSE properties of empirical bayes methods for sparse estimation. In IFAC symposium on system identification, pp. 965–970. Brussels, Belgium.

  22. Mallows, C. L. (1973). Some comments on \(C_p\). Technometrics, 15(4), 661–675.

  23. Allen, D. M. (1974). The relationship between variable selection and data agumentation and a method for prediction. Technometrics, 16(1), 125–127.

    Article  MathSciNet  Google Scholar 

  24. Craven, P., & Wahba, G. (1979). Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numerische Mathematik, 31, 377–403.

    Article  MathSciNet  Google Scholar 

  25. Golub, G. H., Heath, M., & Wahba, G. (1979). Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21(2), 215–223.

    Article  MathSciNet  Google Scholar 

  26. Pillonetto, G., Dinuzzo, F., Chen, T., De Nicolao, G., & Ljung, L. (2014). Kernel methods in system identification, machine learning and function estimation: A survey. Automatica, 50(3), 657–682.

    Article  MathSciNet  Google Scholar 

  27. Ljung, L. (1999). System identification: Theory for the user. Prentice Hall.

    Google Scholar 

  28. Horn, R. A., & Johnson, C. R. (2012). Matrix analysis (2nd ed.). Cambridge University Press.

    Book  Google Scholar 

  29. Gut, A. (2013). Probability: A graduate course (Vol. 75, p. 384). Springer.

    Google Scholar 

  30. Chung, K. L. (2001). A course in probability theory (3rd ed.). Academic Press.

    Google Scholar 

  31. Zorich, V. A., & Paniagua, O. (2016). Mathematical analysis I (Vol. 220). Springer.

    Google Scholar 

  32. Ju, Y., Chen, T., Mu, B., & Ljung, L. (2021). Tutorial on asymptotic properties of regularized least squares estimator for finite impulse response model. ar**v:2112.10319 [math.ST]

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tianshi Chen.

Additional information

This work was supported in part by the National Natural Science Foundation of China (No. 62273287), by the Shenzhen Science and Technology Innovation Council (Nos. JCYJ20220530143418040, JCY20170411102101881), and the Thousand Youth Talents Plan funded by the central government of China.

Appendix A

Appendix A

This section includes the proof of Proposition 1.

1.1 Proof of Proposition 1

To prove Proposition 1, we first derive some preliminary results as shown in Lemmas A.1A.2.

Lemma A.1

There exists a compact set \(\widetilde{\varOmega }\subset \varOmega \) containing \(\eta _{{{\,\textrm{b}\,}}}^{*}\), such that for \(k,l=1,2,\ldots ,p\)

$$\begin{aligned}&\Vert P\Vert _{F},\ \text {and}\ \Vert \hat{S}^{-1}\Vert _{F}<\Vert P^{-1}\Vert _{F}\ \text {are}\ \text {bounded}, \end{aligned}$$
(A1a)
$$\begin{aligned}&\left\| \frac{\partial P}{\partial \eta _{l}}\right\| _{F}\ \text {and}\ \left\| \frac{\partial ^2 P}{\partial \eta _{k}\partial \eta _{l}}\right\| _{F}\ \text {are}\ \text {bounded}, \end{aligned}$$
(A1b)
$$\begin{aligned}&\left\| \frac{\partial \hat{S}^{-1}}{\partial \eta _{k}}\right\| _{F}\ \text {and}\ \left\| \frac{\partial P^{-1}}{\partial \eta _{k}} \right\| _{F}\ \text {are}\ \text {bounded}, \end{aligned}$$
(A1c)
$$\begin{aligned}&\left\| \frac{\partial ^2 \hat{S}^{-1}}{\partial \eta _{k}\partial \eta _{l}}\right\| _{F}\ \text {and}\ \left\| \frac{\partial ^2 P^{-1}}{\partial \eta _{k}\partial \eta _{l}}\right\| _{F}\ \text {are}\ \text {bounded}, \end{aligned}$$
(A1d)

and there exists \(\widetilde{M}_{1,{{\,\textrm{b}\,}}}>0\), irrespective of \(\eta \), such that

$$\begin{aligned}&\sup _{\eta \in \widetilde{\varOmega }}\left\| \frac{\partial \hat{S}^{-1}}{\partial \eta _{l}}-\frac{\partial P^{-1}}{\partial \eta _{l}}\right\| _{F}\!\!\le \! \frac{1}{N}\widetilde{M}_{1,{{\,\textrm{b}\,}}}\widehat{\sigma ^2}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}, \end{aligned}$$
(A2a)
$$\begin{aligned}&\sup _{\eta \in \widetilde{\varOmega }}\left\| \frac{\partial ^2 \hat{S}^{-1}}{\partial \eta _{k}\partial \eta _{l}}-\frac{\partial ^2 P^{-1}}{\partial \eta _{k}\partial \eta _{l}}\right\| _{F}\!\!\le \! \frac{1}{N}\widetilde{M}_{1,{{\,\textrm{b}\,}}}\widehat{\sigma ^2}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}, \end{aligned}$$
(A2b)

where \(\hat{S}\) is defined in (23).

Proof

If Assumption 4 holds, we have (A1a)–(A1b). Moreover, note that

$$\begin{aligned} \frac{\partial P^{-1}}{\partial \eta _{k}}&= -P^{-1}\frac{\partial P}{\partial \eta _{k}}P^{-1} \end{aligned}$$
(A3a)
$$\begin{aligned} \frac{\partial \hat{S}^{-1}}{\partial \eta _{k}}&=-\hat{S}^{-1}\frac{\partial P}{\partial \eta _{k}}\hat{S}^{-1}, \end{aligned}$$
(A3b)
$$\begin{aligned} \frac{\partial ^2 P^{-1}}{\partial \eta _{k}\partial \eta _{l}}&=P^{-1}\frac{\partial P}{\partial \eta _{l}}P^{-1}\frac{\partial P}{\partial \eta _{k}}P^{-1} -P^{-1}\frac{\partial ^2 P}{\partial \eta _{k}\partial \eta _{l}}P^{-1}\nonumber \\&\quad +P^{-1}\frac{\partial P}{\partial \eta _{k}}P^{-1}\frac{\partial P}{\partial \eta _{l}}P^{-1} \end{aligned}$$
(A3c)
$$\begin{aligned} \frac{\partial ^2 \hat{S}^{-1}}{\partial \eta _{k}\partial \eta _{l}}&=\hat{S}^{-1}\frac{\partial P}{\partial \eta _{l}}\hat{S}^{-1}\frac{\partial P}{\partial \eta _{k}}\hat{S}^{-1} -\hat{S}^{-1}\frac{\partial ^2 P}{\partial \eta _{k}\partial \eta _{l}}\hat{S}^{-1}\nonumber \\&\quad +\hat{S}^{-1}\frac{\partial P}{\partial \eta _{k}}\hat{S}^{-1}\frac{\partial P}{\partial \eta _{l}}\hat{S}^{-1}. \end{aligned}$$
(A3d)

It can be seen that \(\frac{\partial \hat{S}^{-1}}{\partial \eta _{k}}\), \(\frac{\partial P^{-1}}{\partial \eta _{k}}\), \(\frac{\partial ^2 \hat{S}^{-1}}{\partial \eta _{k}\partial \eta _{l}}\) and \(\frac{\partial ^2 P^{-1}}{\partial \eta _{k}\partial \eta _{l}}\) are all made of \(P^{-1}\), \(\hat{S}^{-1}\), \(\frac{\partial P}{\partial \eta _{k}}\), \(\frac{\partial ^2 P}{\partial \eta _{k}\partial \eta _{l}}\). Since \(\forall \eta \in \widetilde{\varOmega }\), (A1a) and (A1b) both hold, it follows that (A1c) and (A1d) also hold.

Moreover, since

$$\begin{aligned}&\frac{\partial \hat{S}^{-1}}{\partial \eta _{k}}-\frac{\partial P^{-1}}{\partial \eta _{k}}\\&\quad =(P^{-1}-\hat{S}^{-1})\frac{\partial P}{\partial \eta _{k}}P^{-1} +\hat{S}^{-1}\frac{\partial P}{\partial \eta _{k}}(P^{-1}-\hat{S}^{-1}),\\&\quad =\widehat{\sigma ^2}P^{-1}(\varPhi ^\textrm{T}\varPhi )^{-1}\hat{S}^{-1}\frac{\partial P}{\partial \eta _{k}}P^{-1}\\&\qquad +\widehat{\sigma ^2}\hat{S}^{-1}\frac{\partial P}{\partial \eta _{k}}P^{-1}(\varPhi ^\textrm{T}\varPhi )^{-1}\hat{S}^{-1}, \end{aligned}$$

we have

$$\begin{aligned}&\sup _{\eta \in \widetilde{\varOmega }}\left\| \frac{\partial \hat{S}^{-1}}{\partial \eta _{l}}-\frac{\partial P^{-1}}{\partial \eta _{l}}\right\| _{F}\nonumber \\&\quad \le \frac{1}{N}\widehat{\sigma ^2}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}\sup _{\eta \in \widetilde{\varOmega }}\left\| \frac{\partial P}{\partial \eta _{k}}\right\| _{F}\nonumber \\&\qquad \cdot \bigg (\sup _{\eta \in \widetilde{\varOmega }}\Vert P^{-1}\Vert _{F}^2\sup _{\eta \in \widetilde{\varOmega }}\Vert \hat{S}^{-1}\Vert _{F}\nonumber \\&\qquad +\sup _{\eta \in \widetilde{\varOmega }}\Vert \hat{S}^{-1}\Vert _{F}^2\sup _{\eta \in \widetilde{\varOmega }}\Vert P^{-1}\Vert _{F}\bigg ). \end{aligned}$$
(A4)

Using (A1a)–(A1b) and [32, Lemma B.1], we can see that (A4) leads to (A2a). Similarly, we can also derive (A2b) from [32, Lemma B.1], (A1c)–(A1d).

Lemma A.2

Under Assumptions 12, there exists \(\widetilde{M}_{2,{{\,\textrm{b}\,}}}>0\), such that

$$\begin{aligned}&{\mathbb {E}}\bigg (\bigg \Vert \frac{\varPhi ^\textrm{T}V}{N}\bigg \Vert _{2}^8\bigg )\le \frac{1}{N^4}\widetilde{M}_{2,{{\,\textrm{b}\,}}}, \end{aligned}$$
(A5)
$$\begin{aligned}&{\mathbb {E}}\big [\big (\widehat{\sigma ^2}\big )^8\big ]\le \widetilde{M}_{2,{{\,\textrm{b}\,}}}, \end{aligned}$$
(A6)
$$\begin{aligned}&{\mathbb {E}}\left( \Vert \hat{\theta }^{{{\,\textrm{LS}\,}}}\Vert _{2}^8 \right) \le \widetilde{M}_{2,{{\,\textrm{b}\,}}}. \end{aligned}$$
(A7)

Proof

We shall derive (A5)–(A7), respectively.

For (A5), we have

$$\begin{aligned}&{\mathbb {E}}\bigg (\bigg \Vert \frac{\varPhi ^{\textrm{T}}V}{N}\bigg \Vert _{2}^8\bigg )\nonumber \\&\quad =\frac{1}{N^8}{\mathbb {E}}\left\{ \textstyle \sum \limits _{i=1}^{n} \left[ \textstyle \sum \limits _{t=1}^{N}u(t)v(t+i)\right] ^2\right\} ^4\nonumber \\&\quad \le \frac{1}{N^8} (2^3)^{n-1}\textstyle \sum \limits _{i=1}^{n}\left\{ {\mathbb {E}}\left[ \textstyle \sum \limits _{t=1}^{N}u(t)v(t+i)\right] ^8 \right\} \nonumber \\&\qquad (\text {using}~\text {[29, Theorem 2.2]})\nonumber \\&\quad \le \frac{1}{N^4}\widetilde{M}_{2,{{\,\textrm{b}\,}}}, \end{aligned}$$
(A8)

where the last step is derived due to that \(\{v(t)\}_{t=1}^{N}\) is independent and has bounded moments of order \(8+\delta \) for \(\delta >0\), and the boundedness of u(t) as mentioned in Assumptions 12. Note that the boundedness of higher order moments can lead to the boundedness of lower order ones.

For (A6), we have

$$\begin{aligned}&{\mathbb {E}}(\widehat{\sigma ^2})^8\\&\quad =\frac{1}{(N-n)^8}{\mathbb {E}}(\Vert Y-\varPhi \hat{\theta }^{{{\,\textrm{LS}\,}}}\Vert _{2}^2)^8\\&\quad =\frac{1}{(N-n)^8}{\mathbb {E}}\left[ V^\textrm{T}V-V^\textrm{T}\varPhi (\varPhi ^\textrm{T}\varPhi )^{-1}\varPhi ^\textrm{T}V \right] ^8\\&\quad \le \frac{1}{(N-n)^8}2^7\left[ {\mathbb {E}}(V^\textrm{T}V)^8+{\mathbb {E}}(V^\textrm{T}\varPhi (\varPhi ^\textrm{T} \varPhi )^{-1}\varPhi ^\textrm{T}V)^8 \right] \\&\quad =\frac{1}{(N-n)^8}2^7\Big \{{\mathbb {E}}\Big [\textstyle \sum \limits _{t=1}^{N}v^2(t) \Big ]^8 \\&\qquad +{\mathbb {E}}\Big [\textstyle \sum \limits _{t_{1}=1}^{N}\textstyle \sum \limits _{t_{2}=1}^{N}v(t_{1}) v(t_{2})h_{t_{1},t_{2}} \Big ]^8 \Big \}\\&\quad \le \widetilde{M}_{2,{{\,\textrm{b}\,}}}, \end{aligned}$$

where \(h_{t_{1},t_{2}}\) is the \((t_{1},t_{2})\)th element of \(\varPhi (\varPhi ^\textrm{T}\varPhi )^{-1}\varPhi ^\textrm{T}\) and is bounded, and the last step is due to that \(\{v(t)\}_{t=1}^{N}\) is independent and has bounded moments of order \(16+\delta \) for \(\delta >0\) as shown in Assumption 2.

For (A7), we have

$$\begin{aligned}&{\mathbb {E}}\left( \Vert \hat{\theta }^{{{\,\textrm{LS}\,}}}\Vert _{2}^8 \right) \\&\quad ={\mathbb {E}}\left( \Vert \theta _{0}+(\varPhi ^\textrm{T}\varPhi )^{-1}\varPhi ^\textrm{T}V\Vert _{2}^8 \right) \\&\quad ={\mathbb {E}}\left( \Vert \theta _{0}\Vert _{2}^{2}\!+\!V^\textrm{T}\varPhi (\varPhi ^\textrm{T}\varPhi ) ^{-1}\varPhi ^\textrm{T}V\!+\!2\theta _{0}^\textrm{T}(\varPhi ^\textrm{T}\varPhi )^{-1}\varPhi ^\textrm{T}V\right) ^4\\&\quad \le 2^{3}\Big [ \Vert \theta _{0}\Vert _{2}^{8}+{\mathbb {E}}(V^\textrm{T}\varPhi (\varPhi ^\textrm{T}\varPhi )^{-1}\varPhi ^\textrm{T}V)^4\\&\qquad +{\mathbb {E}}(2\theta _{0}^\textrm{T}(\varPhi ^\textrm{T}\varPhi )^{-1}\varPhi ^\textrm{T}V)^4 \Big ]\\&\quad \le \widetilde{M}_{2,{{\,\textrm{b}\,}}}, \end{aligned}$$

where for the last second step, we apply [29, Theorem 2.2], and for the last step, we utilize Assumptions 12.

Now, using Lemmas A.1A.2, we shall derive (24), (25) and (26), respectively.

Proof of (24)

We first rewrite the difference between \(\overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}\) in (22) and \(W_{{{\,\textrm{b}\,}}}\) in (16b) as

$$\begin{aligned}&\overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}-W_{{{\,\textrm{b}\,}}}\nonumber \\&\quad =(\hat{\theta }^{{{\,\textrm{LS}\,}}}-\theta _{0})^\textrm{T}\hat{S}^{-1}\hat{\theta }^{{{\,\textrm{LS}\,}}}+\theta _{0}^\textrm{T}(\hat{S}^{-1}-P^{-1})\hat{\theta }^{{{\,\textrm{LS}\,}}}\nonumber \\&\qquad +\theta _{0}^\textrm{T}P^{-1}(\hat{\theta }^{{{\,\textrm{LS}\,}}}-\theta _{0})+[\log \det (\hat{S})-\log \det (P)]\nonumber \\&\quad =\Big (\frac{\varPhi ^\textrm{T}V}{N} \Big )^\textrm{T}N(\varPhi ^\textrm{T}\varPhi )^{-1}\hat{S}^{-1}\hat{\theta }^{{{\,\textrm{LS}\,}}}\nonumber \\&\qquad -\widehat{\sigma ^2}\theta _{0}^\textrm{T}P^{-1}(\varPhi ^\textrm{T}\varPhi )^{-1}\hat{S}^{-1}\hat{\theta }^{{{\,\textrm{LS}\,}}}\nonumber \\&\qquad +\theta _{0}^\textrm{T}P^{-1}N(\varPhi ^\textrm{T}\varPhi )^{-1}\frac{\varPhi ^\textrm{T}V}{N}\!+\!\log \det (P^{-1/2}\hat{S}P^{-1/2}). \end{aligned}$$
(A9)

It follows that:

$$\begin{aligned}&\sup _{\eta \in \widetilde{\varOmega }}\vert \overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}-W_{{{\,\textrm{b}\,}}}\vert \nonumber \\&\quad \le \sup _{\eta \in \widetilde{\varOmega }}\bigg \{\left| \left( \frac{\varPhi ^\textrm{T}V}{N} \right) ^\textrm{T}N(\varPhi ^\textrm{T}\varPhi )^{-1}\hat{S}^{-1}\hat{\theta }^{{{\,\textrm{LS}\,}}}\right| \nonumber \\&\qquad +\left| \widehat{\sigma ^2}\theta _{0}^\textrm{T}P^{-1}(\varPhi ^\textrm{T}\varPhi )^{-1}\hat{S}^{-1}\hat{\theta }^{{{\,\textrm{LS}\,}}}\right| \nonumber \\&\qquad +\bigg \vert \theta _{0}^\textrm{T}P^{-1}N(\varPhi ^\textrm{T}\varPhi )^{-1}\frac{\varPhi ^\textrm{T}V}{N} \bigg \vert \nonumber \\&\qquad + \max \Big \{\left| {{\,\textrm{Tr}\,}}[\widehat{\sigma ^{2}}(\varPhi ^\textrm{T}\varPhi )^{-1}\hat{S}^{-1}]\right| ,\nonumber \\&\qquad \left| {{\,\textrm{Tr}\,}}[\widehat{\sigma ^2}(\varPhi ^\textrm{T}\varPhi )^{-1}P^{-1}]\right| \Big \}\bigg \}, \end{aligned}$$
(A10)

where we apply

$$\begin{aligned}&{{\,\textrm{Tr}\,}}(I_{n}-P^{1/2}\hat{S}^{-1}P^{1/2})\\&\quad \le \log \det (P^{-1/2}\hat{S}P^{-1/2})\!\le \!{{\,\textrm{Tr}\,}}(P^{-1/2}\hat{S}P^{-1/2}\!-\!I_{n}) \end{aligned}$$

using [32, Lemma B.16] with \(A=P^{-1/2}\hat{S}P^{-1/2}\). Then, according to [32, Lemma B.1] and (A1a), there exists \(\widetilde{M}_{3,{{\,\textrm{b}\,}}}>0\), such that (A10) can be further derived as

$$\begin{aligned}&\sup _{\eta \in \widetilde{\varOmega }}\vert \overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}-W_{{{\,\textrm{b}\,}}}\vert \nonumber \\&\quad \le \widetilde{M}_{3,{{\,\textrm{b}\,}}}\left\| \frac{\varPhi ^\textrm{T}V}{N}\right\| _{2}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}\Vert \hat{\theta }^{{{\,\textrm{LS}\,}}}\Vert _{2}\nonumber \\&\qquad +\frac{1}{N}\widetilde{M}_{3,{{\,\textrm{b}\,}}}\Vert \theta _{0}\Vert _{2}\widehat{\sigma ^2}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}\Vert \hat{\theta }^{{{\,\textrm{LS}\,}}}\Vert _{2}\nonumber \\&\qquad +\frac{1}{N}\widetilde{M}_{3,{{\,\textrm{b}\,}}}\widehat{\sigma ^2}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}. \end{aligned}$$
(A11)

According to [32, Lemmas B.5-B.6], we have

$$\begin{aligned}&{\mathbb {E}}\left( \sup _{\eta \in \widetilde{\varOmega }}\vert \overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}-W_{{{\,\textrm{b}\,}}}\vert \right) ^4\nonumber \\&\quad \le 2^6\widetilde{M}_{3,{{\,\textrm{b}\,}}}^4\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}^4\bigg \{\Big [{\mathbb {E}}\Big (\Big \Vert \frac{\varPhi ^\textrm{T}V}{N}\Big \Vert _{2}^8\Big ){\mathbb {E}}\Big (\Vert \hat{\theta }^{{{\,\textrm{LS}\,}}}\Vert _{2}^{8}\Big )\Big ]^{1/2}\nonumber \\&\qquad +\frac{1}{N^4}\Vert \theta _{0}\Vert _{2}^{4}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}^4\left[ {\mathbb {E}}\left( \widehat{\sigma ^2}\right) ^8{\mathbb {E}}\left( \Vert \hat{\theta }^{{{\,\textrm{LS}\,}}}\Vert _{2}^{8}\right) \right] ^{1/2}\nonumber \\&\qquad +\frac{1}{N^4}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}^4{\mathbb {E}}\big [\big (\widehat{\sigma ^2}\big )^4 \big ]\bigg \}. \end{aligned}$$
(A12)

Thus, using Lemma A.2 and (A5), we can show that \(\exists \ \widetilde{M}_{4,{{\,\textrm{b}\,}}}>0\), such that

$$\begin{aligned} {\mathbb {E}}\left( \sup _{\eta \in \widetilde{\varOmega }}\vert \overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}-W_{{{\,\textrm{b}\,}}}\vert \right) ^4\le \frac{1}{N^2}\widetilde{M}_{4,{{\,\textrm{b}\,}}}. \end{aligned}$$
(A13)

\(\square \)

Proof of (25)

According to the definition of Frobenius norm, it can be known that

$$\begin{aligned}&{\mathbb {E}}\left( \sup _{\eta \in \widetilde{\varOmega }} \left\| \frac{\partial ^2 \overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}}{\partial \eta \partial \eta ^\textrm{T}}-\frac{\partial ^2 W_{{{\,\textrm{b}\,}}}}{\partial \eta \partial \eta ^\textrm{T}}\right\| _{F}\right) ^4\nonumber \\&\quad \le {\mathbb {E}}\left( \sup _{\eta \in \widetilde{\varOmega }} \textstyle \sum \limits _{k=1}^{p}\textstyle \sum \limits _{l=1}^{p} \left| \dfrac{\partial ^2 \overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}}{\partial \eta _{k}\partial \eta _{l}}-\dfrac{\partial ^2 W_{{{\,\textrm{b}\,}}}}{\partial \eta _{k}\partial \eta _{l}}\right| ^2 \right) ^2\nonumber \\&\quad \le {\mathbb {E}}\left( \textstyle \sum \limits _{k=1}^{p}\textstyle \sum \limits _{l=1}^{p} \sup _{\eta \in \widetilde{\varOmega }} \left| \dfrac{\partial ^2 \overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}}{\partial \eta _{k}\partial \eta _{l}}-\dfrac{\partial ^2 W_{{{\,\textrm{b}\,}}}}{\partial \eta _{k}\partial \eta _{l}}\right| ^2 \right) ^2\nonumber \\&\quad \le 2^{p^2-1}\textstyle \sum \limits _{k=1}^{p}\textstyle \sum \limits _{l=1}^{p}{\mathbb {E}}\left( \sup _{\eta \in \widetilde{\varOmega }} \left| \dfrac{\partial ^2 \overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}}{\partial \eta _{k}\partial \eta _{l}}-\dfrac{\partial ^2 W_{{{\,\textrm{b}\,}}}}{\partial \eta _{k}\partial \eta _{l}}\right| \right) ^4, \end{aligned}$$
(A14)

where the last step is derived using [29, Theorem 2.2].

Recall [20, (A.6)–(A.7)] and matrix norm inequalities in [32, Lemma B.1]. It follows that:

$$\begin{aligned}&\sup _{\eta \in \widetilde{\varOmega }} \left| \frac{\partial ^2 \overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}}{\partial \eta _{k}\partial \eta _{l}}-\frac{\partial ^2 W_{{{\,\textrm{b}\,}}}}{\partial \eta _{k}\partial \eta _{l}}\right| \nonumber \\&\quad \le \sup _{\eta \in \widetilde{\varOmega }}\Bigg \{\left\| \frac{\varPhi ^\textrm{T}V}{N}\right\| _{2}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F} \left\| \frac{\partial ^2 \hat{S}^{-1}}{\partial \eta _{k}\partial \eta _{l}}\right\| _{F}\Vert \hat{\theta }^{{{\,\textrm{LS}\,}}}\Vert _{2}\nonumber \\&\qquad +\Vert \theta _{0}\Vert _{2}\left\| \frac{\partial ^2 \hat{S}^{-1}}{\partial \eta _{k}\partial \eta _{l}}-\frac{\partial ^2 P^{-1}}{\partial \eta _{k}\partial \eta _{l}}\right\| _{F}\Vert \hat{\theta }^{{{\,\textrm{LS}\,}}}\Vert _{2}\nonumber \\&\qquad +\Vert \theta _{0}\Vert _{2}\left\| \frac{\partial ^2 P^{-1}}{\partial \eta _{k}\partial \eta _{l}}\right\| _{F}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}\left\| \frac{\varPhi ^\textrm{T}V}{N}\right\| _{2}\nonumber \\&\qquad +\sqrt{n}\left\| \frac{\partial \hat{S}^{-1}}{\partial \eta _{l}}-\frac{\partial P^{-1}}{\partial \eta _{l}}\right\| _{F} \left\| \frac{\partial P}{\partial \eta _{k}}\right\| _{F}\nonumber \\&\qquad +\frac{1}{N}\sqrt{n}\widehat{\sigma ^2}\Vert \hat{S}^{-1}\Vert _{F}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}\nonumber \\&\qquad \cdot \Vert P^{-1}\Vert _{F}\left\| \frac{\partial ^2 P}{\partial \eta _{k}\partial \eta _{l}}\right\| _{F}\Bigg \}. \end{aligned}$$
(A15)

Then, according to (A1a), (A1b), (A1d), (A2a), and (A2b), there exists \(\widetilde{M}_{5,{{\,\textrm{b}\,}}}>0\), such that we can rewrite (A15) as

$$\begin{aligned}&\sup _{\eta \in \widetilde{\varOmega }} \left| \frac{\partial ^2 \overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}}{\partial \eta _{k}\partial \eta _{l}}-\frac{\partial ^2 W_{{{\,\textrm{b}\,}}}}{\partial \eta _{k}\partial \eta _{l}}\right| \nonumber \\&\quad \le \widetilde{M}_{5,{{\,\textrm{b}\,}}}\left\| \frac{\varPhi ^\textrm{T}V}{N}\right\| _{2}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}\Vert \hat{\theta }^{{{\,\textrm{LS}\,}}}\Vert _{2}\nonumber \\&\qquad +\widetilde{M}_{5,{{\,\textrm{b}\,}}}\left\| \frac{\varPhi ^\textrm{T}V}{N}\right\| _{2}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}\nonumber \\&\qquad +\widetilde{M}_{5,{{\,\textrm{b}\,}}}\frac{1}{N}\widehat{\sigma ^2}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}\Vert \hat{\theta }^{{{\,\textrm{LS}\,}}}\Vert _{2}\nonumber \\&\qquad +\widetilde{M}_{5,{{\,\textrm{b}\,}}}\frac{1}{N}\widehat{\sigma ^2}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}. \end{aligned}$$
(A16)

Moreover, using (A16) and [29, Theorems 2.2, 3.1], we can further show that

$$\begin{aligned}&{\mathbb {E}}\left( \sup _{\eta \in \widetilde{\varOmega }} \left| \frac{\partial ^2 \overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}}{\partial \eta _{k}\partial \eta _{l}}-\frac{\partial ^2 W_{{{\,\textrm{b}\,}}}}{\partial \eta _{k}\partial \eta _{l}}\right| \right) ^4\nonumber \\&\quad \le 2^9 \widetilde{M}_{5,{{\,\textrm{b}\,}}}^4\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}^4\Bigg \{ \Big [{\mathbb {E}}\Big (\Big \Vert \frac{\varPhi ^\textrm{T}V}{N}\Big \Vert _{2}^8\Big ) {\mathbb {E}}\Big (\Vert \hat{\theta }^{{{\,\textrm{LS}\,}}}\Vert _{2}^8\Big ) \Big ]^{1/2} \nonumber \\&\qquad + {\mathbb {E}}\Big (\Big \Vert \frac{\varPhi ^\textrm{T}V}{N}\Big \Vert _{2}^4\Big ) +\frac{1}{N^4}\left[ {\mathbb {E}}\left( (\widehat{\sigma ^2})^8\right) {\mathbb {E}}(\Vert \hat{\theta }^{{{\,\textrm{LS}\,}}}\Vert _{2}^8) \right] ^{1/2}\nonumber \\&\qquad +\frac{1}{N^4}{\mathbb {E}}\big (\widehat{\sigma ^2}\big )^4\Bigg \}. \end{aligned}$$
(A17)

Together (A14) with Lemma A.2 and (A17), we have \(\exists \ \widetilde{M}_{6,{{\,\textrm{b}\,}}}>0\), such that

$$\begin{aligned} {\mathbb {E}}\left( \sup _{\eta \in \widetilde{\varOmega }} \left\| \frac{\partial ^2 \overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}}{\partial \eta \partial \eta ^\textrm{T}}-\frac{\partial ^2 W_{{{\,\textrm{b}\,}}}}{\partial \eta \partial \eta ^\textrm{T}}\right\| _{F}\right) ^4 \le \frac{1}{N^2} \widetilde{M}_{6,{{\,\textrm{b}\,}}}. \end{aligned}$$
(A18)

\(\square \)

Proof of (26)

Let us consider the boundedness of \({\mathbb {E}}\left\| \left. {\partial \overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}}/{\partial \eta }\right| _{\eta ^{*}_{{{\,\textrm{b}\,}}}} \right\| _{2}^4\) first. Under Assumption 6, \(\eta _{{{\,\textrm{b}\,}}}^{*}\) should satisfy the first-order optimality condition, i.e., \({\partial W_{{{\,\textrm{b}\,}}}}/{\partial \eta }\vert _{\eta ^{*}_{{{\,\textrm{b}\,}}}}=0\). It yields that

$$\begin{aligned}&{\mathbb {E}}\left\| \left. \frac{\partial \overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}}{\partial \eta }\right| _{\eta ^{*}_{{{\,\textrm{b}\,}}}} \right\| _{2}^4\nonumber \\&\quad ={\mathbb {E}}\left\| \left. \left( \frac{\partial \overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}}{\partial \eta } -\frac{\partial W_{{{\,\textrm{b}\,}}}}{\partial \eta } \right) \right| _{\eta ^{*}_{{{\,\textrm{b}\,}}}} \right\| _{2}^4\nonumber \\&\quad \le {\mathbb {E}}\left( \sup _{\eta \in \widetilde{\varOmega }}\left\| \frac{\partial \overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}}{\partial \eta } -\frac{\partial W_{{{\,\textrm{b}\,}}}}{\partial \eta } \right\| _{2}\right) ^4\nonumber \\&\quad \le {\mathbb {E}}\left[ \textstyle \sum \limits _{k=1}^{p}\sup _{\eta \in \widetilde{\varOmega }}\left( \frac{\partial \overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}}{\partial \eta _{k}} -\frac{\partial W_{{{\,\textrm{b}\,}}}}{\partial \eta _{k}} \right) ^2 \right] ^2\nonumber \\&\quad \le 2^{p-1}\textstyle \sum \limits _{k=1}^{p}{\mathbb {E}}\left( \sup _{\eta \in \widetilde{\varOmega }}\left| \frac{\partial \overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}}{\partial \eta _{k}} -\frac{\partial W_{{{\,\textrm{b}\,}}}}{\partial \eta _{k}} \right| \right) ^4, \end{aligned}$$
(A19)

where the last step is derived from [29, Theorem 2.2].

For any \(k=1,\ldots ,p\), using [20, (A.11)–(A.15)], we know that

$$\begin{aligned}&\sup _{\eta \in \widetilde{\varOmega }}\left| \frac{\partial \overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}}{\partial \eta _{k}} -\frac{\partial W_{{{\,\textrm{b}\,}}}}{\partial \eta _{k}} \right| \\&\quad \le \sup _{\eta \in \widetilde{\varOmega }} \Bigg \{ \Vert \theta _{0}\Vert _{2}\left\| \frac{\partial \hat{S}^{-1}}{\partial \eta _{k}}-\frac{\partial P^{-1}}{\partial \eta _{k}}\right\| _{F}\Vert \hat{\theta }^{{{\,\textrm{LS}\,}}}\Vert _{2}\\&\qquad +\left\| \frac{\varPhi ^\textrm{T}V}{N}\right\| _{2}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}\left\| \frac{\partial \hat{S}^{-1}}{\partial \eta _{k}}\right\| _{F}\Vert \hat{\theta }^{{{\,\textrm{LS}\,}}}\Vert _{2}\\&\qquad +\left\| \frac{\varPhi ^\textrm{T}V}{N}\right\| _{2}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}\left\| \frac{\partial P^{-1}}{\partial \eta _{k}}\right\| _{F}\Vert \theta _{0}\Vert _{2}\\&\qquad +\!\!\frac{1}{N}\sqrt{n}\widehat{\sigma ^2}\Vert \hat{S}^{-1}\Vert _{F}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}\Vert P^{-1}\Vert _{F}\left\| \frac{\partial P}{\partial \eta _{k}}\right\| _{F}\!\!\Bigg \}. \end{aligned}$$

Applying (A1a)–(A1c), (A2a), and [32, Lemma B.1], we can further derive that there exists \(\widetilde{M}_{7,{{\,\textrm{b}\,}}}>0\), such that

$$\begin{aligned}&\sup _{\eta \in \widetilde{\varOmega }}\left| \frac{\partial \overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}}{\partial \eta _{k}} -\frac{\partial W_{{{\,\textrm{b}\,}}}}{\partial \eta _{k}} \right| \\&\quad \le \widetilde{M}_{7,{{\,\textrm{b}\,}}} \frac{1}{N}\widehat{\sigma ^2}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}\Vert \hat{\theta }^{{{\,\textrm{LS}\,}}}\Vert _{2}\\&\qquad +\widetilde{M}_{7,{{\,\textrm{b}\,}}}\left\| \frac{\varPhi ^\textrm{T}V}{N}\right\| _{2}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F} \Vert \hat{\theta }^{{{\,\textrm{LS}\,}}}\Vert _{2}\\&\qquad +\widetilde{M}_{7,{{\,\textrm{b}\,}}}\left\| \frac{\varPhi ^\textrm{T}V}{N}\right\| _{2}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}\\&\qquad +\widetilde{M}_{7,{{\,\textrm{b}\,}}}\frac{1}{N}\widehat{\sigma ^2}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}. \end{aligned}$$

Then, using [29, Theorems 2.2, 3.1], we have

$$\begin{aligned}&{\mathbb {E}}\left( \sup _{\eta \in \widetilde{\varOmega }}\left| \frac{\partial \overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}}{\partial \eta _{k}} -\frac{\partial W_{{{\,\textrm{b}\,}}}}{\partial \eta _{k}} \right| \right) ^4\nonumber \\&\quad \le 2^9\widetilde{M}_{7,{{\,\textrm{b}\,}}}^4\left\{ \frac{1}{N^4}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}^4{\mathbb {E}}\left( \widehat{\sigma ^2} \Vert \hat{\theta }^{{{\,\textrm{LS}\,}}}\Vert _{2}\right) ^4 \right. \nonumber \\&\qquad +\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}^4\Big [ {\mathbb {E}}\Big (\Big \Vert \frac{\varPhi ^\textrm{T}V}{N}\Big \Vert _{2}\Big )^8 {\mathbb {E}}\Big ( \Vert \hat{\theta }^{{{\,\textrm{LS}\,}}}\Vert _{2}\Big )^8 \Big ]^{1/2}\nonumber \\&\qquad +\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}^4 {\mathbb {E}}\Big (\Big \Vert \frac{\varPhi ^\textrm{T}V}{N}\Big \Vert _{2}\Big )^4\nonumber \\&\qquad +\left. \frac{1}{N^4}\Vert N(\varPhi ^\textrm{T}\varPhi )^{-1}\Vert _{F}^4{\mathbb {E}}\big ( \widehat{\sigma ^2}\big )^4\right\} . \end{aligned}$$
(A20)

Analogously, our next step is to apply Lemma A.2 and (A20) to (A19), and then, there exists \(\widetilde{M}_{8,{{\,\textrm{b}\,}}}>0\), such that

$$\begin{aligned} {\mathbb {E}}\bigg \Vert \frac{\partial \overline{\mathscr {F}_{{{\,\textrm{EB}\,}}}}}{\partial \eta }\Big \vert _{\eta =\eta _{{{\,\textrm{b}\,}}}^{*}} \bigg \Vert _{2}^4 \le \frac{1}{N^2}\widetilde{M}_{8,{{\,\textrm{b}\,}}}. \end{aligned}$$
(A21)

\(\square \)

Define

$$\begin{aligned} \check{M}_{{{\,\textrm{b}\,}}}=\max \big \{\widetilde{M}_{4,{{\,\textrm{b}\,}}},\widetilde{M}_{6,{{\,\textrm{b}\,}}}, \widetilde{M}_{8,{{\,\textrm{b}\,}}}\big \}, \end{aligned}$$
(A22)

and then, our proof is complete.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ju, Y., Mu, B. & Chen, T. On convergence of covariance matrix of empirical Bayes hyper-parameter estimator. Control Theory Technol. 22, 149–162 (2024). https://doi.org/10.1007/s11768-024-00211-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11768-024-00211-z

Keywords

Navigation