Variational inference for sparse spectrum Gaussian process regression

Tan, Linda S. L.; Ong, Victor M. H.; Nott, David J.; Jasra, Ajay

doi:10.1007/s11222-015-9600-7

Variational inference for sparse spectrum Gaussian process regression

Published: 12 September 2015

Volume 26, pages 1243–1261, (2016)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Linda S. L. Tan¹,
Victor M. H. Ong¹,
David J. Nott¹ &
…
Ajay Jasra¹

820 Accesses
7 Citations
Explore all metrics

Abstract

We develop a fast variational approximation scheme for Gaussian process (GP) regression, where the spectrum of the covariance function is subjected to a sparse approximation. Our approach enables uncertainty in covariance function hyperparameters to be treated without using Monte Carlo methods and is robust to overfitting. Our article makes three contributions. First, we present a variational Bayes algorithm for fitting sparse spectrum GP regression models that uses nonconjugate variational message passing to derive fast and efficient updates. Second, we propose a novel adaptive neighbourhood technique for obtaining predictive inference that is effective in dealing with nonstationarity. Regression is performed locally at each point to be predicted and the neighbourhood is determined using a measure defined based on lengthscales estimated from an initial fit. Weighting dimensions according to lengthscales, this downweights variables of little relevance, leading to automatic variable selection and improved prediction. Third, we introduce a technique for accelerating convergence in nonconjugate variational message passing by adapting step sizes in the direction of the natural gradient of the lower bound. Our adaptive strategy can be easily implemented and empirical results indicate significant speedups.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse Information Filter for Fast Gaussian Process Regression

Hilbert space methods for reduced-rank Gaussian process regression

Article Open access 05 August 2019

Sparse Spectrum Gaussian Process for Bayesian Optimization

References

Amari, S.: Natural gradient works efficiently in learning. Neural Comput. 10, 251–276 (1998)
Article Google Scholar
Attias, H.: Inferring parameters and structure of latent variable models by variational Bayes. In: Laskey, K., Prade, H. (eds.) Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, pp. 21–30. Morgan Kaufmann, San Francisco, CA (1999)
Google Scholar
Attias, H.: A variational Bayesian framework for graphical models. In: Solla, S.A., Leen, T.K., Müller, K.-R. (eds.) Advances in Neural Information Processing Systems 12, pp. 209–215. MIT Press, Cambridge, MA (2000)
Google Scholar
Blei, D.M., Jordan, M.I.: Variational inference for Dirichlet process mixtures. Bayesian Anal. 1, 121–144 (2006)
Article MathSciNet MATH Google Scholar
Boughton, W.: The Australian water balance model. Environ. Model. Softw. 19, 943–956 (2004)
Article Google Scholar
Braun, M., McAuliffe, J.: Variational inference for large-scale models of discrete choice. J. Am. Stat. Assoc. 105, 324–335 (2010)
Article MathSciNet MATH Google Scholar
Gelman, A.: Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 1, 515–533 (2006)
Article MathSciNet MATH Google Scholar
Gramacy, R.B., Apley, D.W.: Local Gaussian process approximation for large computer experiments. J. Comput. Gr. Stat. To appear (2014)
Haas, T.C.: Local prediction of a spatio-temporal process with an application to wet sulfate deposition. J. Am. Stat. Assoc. 90, 1189–1199 (1995)
Article MATH Google Scholar
Hastie, T., Tibshirani, R.: Discriminant adaptive nearest neighbor classification. IEEE Trans. Pattern Anal. Mach. Intell. 18, 607–616 (1996)
Article Google Scholar
Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. J. Mach. Learn. Res. 14, 1303–1347 (2013)
MathSciNet MATH Google Scholar
Honkela, A., Valpola, H., Karhunen, J.: Accelerating cyclic update algorithms for parameter estimation by pattern searches. Neural Process. Lett. 17, 191–203 (2003)
Article MATH Google Scholar
Huang, H., Yang, B., Hsu, C.: Triple jump acceleration for the EM algorithm. In: Han, J., Wah, B. W., Raghavan, V., Wu, X., rastogi, R. (eds.) Proceedings of the 5th IEEE International Conference on Data Mining, pp. 649–652. IEEE Computer Society, Washington, DC, USA (2005)
Kim, H.-M., Mallicka, B.K., Holmesa, C.C.: Analyzing nonstationary spatial data using piecewise Gaussian processes. J. Am. Stat. Assoc. 100, 653–668 (2005)
Article MathSciNet Google Scholar
Knowles, D.A., Minka, T.P.: Non-conjugate variational message passing for multinomial and binary regression. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24, pp. 1701–1709. Curran Associates, Inc., Red Hook, NY (2011)
Google Scholar
Lázaro-Gredilla, M., Quiñonero-Candela, J., Rasmussen, C.E., Figueiras-Vidal, A.R.: Sparse spectrum Gaussian process regression. J. Mach. Learn. Res. 11, 1865–1881 (2010)
MathSciNet MATH Google Scholar
Lázaro-Gredilla, M., Titsias, M.K.: Variational heteroscedastic Gaussian process regression. In: Getoor, L., Scheffer, T. (eds.) Proceedings of the 28th International Conference on Machine Learning, pp. 841–848. Omnipress, Madison, MI, USA (2011)
Google Scholar
Lindgren, F., Rue, H., Lindström, J.: An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. J. R. Stat. Soc. Ser. B 73, 423–498 (2011)
Article MathSciNet MATH Google Scholar
Magnus, J.R., Neudecker, H.: Matrix Differential Calculus with Applications in Statistics and Econometrics. Wiley, Chichester, UK (1988)
MATH Google Scholar
Nguyen-Tuong, D., Seeger, M., Peters, J.: Model learning with local Gaussian process regression. Adv. Robot. 23, 2015–2034 (2009)
Nott, D.J., Tan, S.L., Villani, M., Kohn, R.: Regression density estimation with variational methods and stochastic approximation. J. Comput. Gr. Stat. 21, 797–820 (2012)
Article MathSciNet Google Scholar
Ormerod, J.T., Wand, M.P.: Explaining variational approximations. Am. Stat. 64, 140–153 (2010)
Article MathSciNet MATH Google Scholar
Park, S., Choi, S.: Hierarchical Gaussian process regression. In Sugiyama, M. and Yang, Q. (eds.) Proceedings of 2nd Asian Conference on Machine Learning, pp. 95–110 (2010)
Qi, Y., Jaakkola, T.S.: Parameter expanded variational Bayesian methods. In: Schölkopf, B., Platt, J., Hofmann, T. (eds.) Advances in Neural Information Processing Systems 19, pp. 1097–1104. MIT Press, Cambridge (2006)
Google Scholar
Quinlan, R.: Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243. University of Massachusetts, Morgan Kaufmann, Amherst (1993)
Quiñonero-Candela, T., Rasmussen, C.E.: A unifying view of sparse approximate Gaussian process regression. J. Mach. Learn. Res. 6, 1939–1959 (2005)
MathSciNet MATH Google Scholar
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA (2006)
MATH Google Scholar
Ren, Q., Banerjee, S., Finley, A.O., Hodges, J.S.: Variational Bayesian methods for spatial data analysis. Comput. Stat. Data Anal. 55, 3197–3217 (2011)
Article MathSciNet MATH Google Scholar
Salakhutdinov, R., Roweis, S.: Adaptive overrelaxed bound optimization methods. In: Fawcett, T., Mishra, N. (eds.) Proceedings of the 20th International Conference on Machine Learning, pp. 664–671. AAAI Press, Menlo Park, CA (2003)
Google Scholar
Snelson, E., Ghahramani, Z.: Sparse Gaussian processes using pseudo-inputs. In: Weiss, Y., Schölkopf, B., Platt, J. (eds.) Advances in Neural Information Processing Systems 18, pp. 1257–1264. MIT Press, Cambridge, MA (2006)
Google Scholar
Snelson, E., Ghahramani, Z.: Local and global sparse Gaussian process approximations. In: Meila, M., Shen, X. (eds) JMLR Workshop and Conference Proceedings, vol. 2: AISTATS 2007, pp. 524–531 (2007)
Stan Development Team: RStan: the R interface to Stan, Version 2.5.0. http://mc-stan.org/rstan.html (2014)
Stein, M.L., Chi, Z., Welty, L.J.: Approximating likelihoods for large spatial data sets. J. R. Stat. Soc. Ser. B 66, 275–296 (2004)
Article MathSciNet MATH Google Scholar
Tan, L.S.L., Nott, D.J.: Variational inference for generalized linear mixed models using partially non-centered parametrizations. Stat. Sci. 28, 168–188 (2013)
Article MathSciNet MATH Google Scholar
Tan, L.S.L., Nott, D.J.: A stochastic variational framework for fitting and diagnosing generalized linear mixed models. Bayesian Anal. 9, 963–1004 (2014). doi:10.1214/14-BA885
Article MathSciNet MATH Google Scholar
Titsias, M.K.: Variational learning of inducing variables in sparse Gaussian processes. In: van Dyk, D., Welling, M. (eds.) Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, pp. 567–574 (2009)
Urtasun, R., Darrell, T.: Sparse probabilistic regression for activity-independent human pose inference. IEEE Conf Comput. Vis. Pattern Recognit. 2008, 1–8 (2008)
Google Scholar
Vecchia, A.V.: Estimation and model identication for continuous spatial processes. J. R. Stat. Soc. Ser. B 50, 297–312 (1988)
MathSciNet Google Scholar
Walder, C., Kim, K.I., Schölkopf, B.: Sparse multiscale Gaussian process regression. In McCallum, A. and Roweis, S. (eds.) Proceedings of the 25th International Conference on Machine Learning, pp. 1112–1119. ACM Press, New York (2008)
Wand, M.P., Ormerod, J.T., Padoan, S.A., Frührwirth, R.: Mean field variational Bayes for elaborate distributions. Bayesian Anal. 6, 847–900 (2011)
Article MathSciNet MATH Google Scholar
Wand, M.P.: Fully simplified multivariate normal updates in non-conjugate variational message passing. J. Mach. Learn. Res. 15, 1351–1369 (2014)
Wang, B., Titterington, D.M.: Inadequacy of interval estimates corresponding to variational Bayesian approximations. In: Cowell, R. G., Ghahramani, Z. (eds.) Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, pp. 373–380. Society for Artificial Intelligence and Statistics (2005)
Wang, B., Titterington, D.M.: Convergence properties of a general algorithm for calculating variational Bayesian estimates for a normal mixture model. Bayesian Anal. 3, 625–650 (2006)
MathSciNet MATH Google Scholar
Winn, J., Bishop, C.M.: Variational message passing. J. Mach. Learn. Res. 6, 661–694 (2005)
MathSciNet MATH Google Scholar

Download references

Acknowledgments

We thank Lucy Marshall for supplying the rainfall-runoff data set. Linda Tan was partially supported as part of the Singapore Delft Water Alliance’s tropical reservoir research programme. David Nott, Ajay Jasra and Victor Ong’s research was supported by a Singapore Ministry of Education Academic Research Fund Tier 2 grant (R-155-000-143-112). We also thank the referees and associate editor for their comments which have helped improved the manuscript.

Author information

Authors and Affiliations

Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore
Linda S. L. Tan, Victor M. H. Ong, David J. Nott & Ajay Jasra

Authors

Linda S. L. Tan
View author publications
You can also search for this author in PubMed Google Scholar
Victor M. H. Ong
View author publications
You can also search for this author in PubMed Google Scholar
David J. Nott
View author publications
You can also search for this author in PubMed Google Scholar
Ajay Jasra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Linda S. L. Tan.

Appendices

Appendix 1: Derivation of $E_q(Z)$ and $E_q(Z^TZ)$

Lemma 1

Suppose $\lambda \sim N(\mu ,{\varSigma })$ and $t_1$, $t_2$ are fixed vectors the same length as $\lambda $. Let $t_{12}^-=t_1-t_2$ and $t_{12}^+=t_1+t_2$, then

$$\begin{aligned}&E\left\{ \cos \left( t_1^T\lambda \right) \cos \left( t_2^T\lambda \right) \right\} =\tfrac{1}{2}\left[ \exp \left( -\tfrac{1}{2}{t_{12}^-}^T{\varSigma } t_{12}^-\right) \right. \\&\quad \left. \cdot \cos \left( {t_{12}^-}^T\mu \right) +\exp \left( -\tfrac{1}{2}{t_{12}^+}^T{\varSigma } t_{12}^+\right) \cos \left( {t_{12}^+}^T\mu \right) \right] \\&E\left\{ \sin \left( t_1^T\lambda \right) \sin \left( t_2^T \lambda \right) \right\} =\tfrac{1}{2}\left[ \exp \left( -\tfrac{1}{2}{t_{12}^-}^T{\varSigma } t_{12}^-\right) \right. \\&\quad \left. \cdot \cos \left( {t_{12}^-}^T\mu \right) -\exp \left( -\tfrac{1}{2}{t_{12}^+}^T{\varSigma } t_{12}^+\right) \cos \left( {t_{12}^+}^T\mu \right) \right] \\&E\left\{ \sin \left( t_1^T\lambda \right) \cos \left( t_2^T \lambda \right) \right\} =\tfrac{1}{2}\left[ \exp \left( -\tfrac{1}{2}{t_{12}^-}^T{\varSigma } t_{12}^-\right) \right. \\&\quad \left. \cdot \sin \left( {t_{12}^-}^T\mu \right) + \exp \left( -\tfrac{1}{2}{t_{12}^+}^T{\varSigma } t_{12}^+\right) \sin \left( {t_{12}^+}^T\mu \right) \right] \end{aligned}$$

By setting $t_2=0$ in the first and third expressions, we get

$$\begin{aligned}&E\left\{ \cos \left( t_1^T\lambda \right) \right\} = \exp \left( -\tfrac{1}{2}t_1^T{\varSigma } t_1\right) \cos \left( t_1^T\mu \right) \;\; \text {and} \\&E\left\{ \sin \left( t_1^T\lambda \right) \right\} =\exp \left( -\tfrac{1}{2}t_1^T{\varSigma } t_1\right) \sin \left( t_1^T\mu \right) . \end{aligned}$$

Proof

$E[\exp \{i\lambda ^T(t_1-t_2)\}]=\exp \{i \mu ^T (t_1-t_2)- \tfrac{1}{2}(t_1-t_2)^T {\varSigma } (t_1-t_2)\}$ implies

$$\begin{aligned} E\left[ \cos \left\{ \lambda ^T\left( t_1-t_2\right) \right\} \right]= & {} E\left\{ \cos \left( t_1^T\lambda \right) \cos \left( t_2^T\lambda \right) \right. \nonumber \\&\left. +\sin \left( t_1^T\lambda \right) \sin \left( t_2^T\lambda \right) \right\} \nonumber \\= & {} \exp \left\{ -\tfrac{1}{2}(t_1-t_2)^T {\varSigma } (t_1-t_2)\right\} \nonumber \\&\cdot \cos \left\{ \mu ^T (t_1-t_2)\right\} \end{aligned}$$

(17)

and

$$\begin{aligned} E\left[ \sin \left\{ \lambda ^T\left( t_1-t_2\right) \right\} \right]= & {} E\left\{ \sin \left( t_1^T\lambda \right) \cos \left( t_2^T\lambda \right) \right. \nonumber \\&\left. -\cos \left( t_1^T\lambda \right) \sin \left( t_2^T\lambda \right) \right\} \nonumber \\= & {} \exp \left\{ -\tfrac{1}{2}(t_1-t_2)^T {\varSigma } (t_1-t_2)\right\} \nonumber \\&\cdot \sin \left\{ \mu ^T (t_1-t_2)\right\} . \end{aligned}$$

(18)

Replacing $t_2$ by $-t_2$, we get

$$\begin{aligned} E\left[ \cos \left\{ \lambda ^T\left( t_1+t_2\right) \right\} \right]= & {} E\left\{ \cos \left( t_1^T\lambda \right) \cos \left( t_2^T\lambda \right) \right. \nonumber \\&\left. -\sin \left( t_1^T\lambda \right) \sin \left( t_2^T\lambda \right) \right\} \nonumber \\= & {} \exp \left\{ -\tfrac{1}{2}(t_1+t_2)^T {\varSigma } (t_1+t_2)\right\} \nonumber \\&\cdot \cos \left\{ \mu ^T (t_1+t_2)\right\} \end{aligned}$$

(19)

and

$$\begin{aligned} E[\sin \{\lambda ^T(t_1+t_2)\}]= & {} E\left\{ \sin \left( t_1^T\lambda \right) \cos \left( t_2^T\lambda \right) \right. \nonumber \\&\left. +\cos \left( t_1^T\lambda \right) \sin \left( t_2^T\lambda \right) \right\} \nonumber \\= & {} \exp \left\{ -\tfrac{1}{2}(t_1+t_2)^T {\varSigma } (t_1+t_2)\right\} \nonumber \\&\cdot \sin \left\{ \mu ^T (t_1+t_2)\right\} . \end{aligned}$$

(20)

(17) + (19) gives the first equation of the lemma, (17) – (19) gives the second and (18) + (20) gives the third. $\square $

Using Lemma 1, we have

$$\begin{aligned} E_q(Z)=[E_q(Z_1), \dots , E_q(Z_n)]^T, \end{aligned}$$

where

and $t_{ir}=s_r \odot x_i$ for $i=1,\dots ,n$, $r=1,\dots ,m$. We also have $E_q(Z^TZ)=\sum _{i=1}^n E_q(Z_iZ_i^T)$ where $E_q(Z_iZ_i^T)=\left[ {\begin{matrix} P_i &{} Q_i^T \\ Q_i &{} R_i \end{matrix}}\right] $, where $P_i$, $Q_i$, $R_i$ are all $m\times m$ matrices and

$$\begin{aligned} {P_i}_{rl}= & {} \frac{1}{2}\left\{ \exp \left( -\frac{1}{2}{t_{irl}^-}^T{\varSigma }_\lambda ^qt_{irl}^-\right) \cos \left( {t_{irl}^-}^T\right) \mu _\lambda ^q \right. \\&\left. + \exp \left( -\frac{1}{2}{t_{irl}^+}^T{\varSigma }_\lambda ^qt_{irl}^+\right) \cos \left( {t_{irl}^+}^T\right) \mu _\lambda ^q\right\} ,\\ {Q_i}_{rl}= & {} \frac{1}{2}\left\{ -\exp \left( -\frac{1}{2}{t_{irl}^-}^T{\varSigma }_\lambda ^qt_{irl}^-\right) \sin \left( {t_{irl}^-}^T\right) \mu _\lambda ^q \right. \\&\left. + \exp \left( -\frac{1}{2}{t_{irl}^+}^T{\varSigma }_\lambda ^qt_{irl}^+\right) \sin \left( {t_{irl}^+}^T\right) \mu _\lambda ^q\right\} ,\\ {R_i}_{rl}= & {} \frac{1}{2}\left\{ \exp \left( -\frac{1}{2}{t_{irl}^-}^T{\varSigma }_\lambda ^qt_{irl}^-\right) \cos \left( {t_{irl}^-}^T\right) \mu _\lambda ^q \right. \\&\left. - \exp \left( -\frac{1}{2}{t_{irl}^+}^T{\varSigma }_\lambda ^qt_{irl}^+\right) \cos \left( {t_{irl}^+}^T\right) \mu _\lambda ^q\right\} , \end{aligned}$$

$t_{irl}^-=t_{ir}-t_{il}$, $t_{irl}^+=t_{ir}+t_{il}$ for $r=1,\dots ,m$, $l=1,\dots ,m$.

Appendix 2: Derivation of lower bound

From (6), the lower bound is given by

$$\begin{aligned} \mathcal {L}=E_q\{\log p(y,\theta )\}-E_q\{\log q(\theta )\} \end{aligned}$$

where

$$\begin{aligned} E_q\{\log p(y,\theta )\}= & {} E_q\{\log p(y|\alpha ,\lambda ,\gamma )\} + E_q\{\log p(\alpha |\sigma )\} \\&+E_q\{\log p(\lambda )\}+E_q\{\log p(\sigma )\} \\&+E_q\{\log p(\gamma )\}, \end{aligned}$$

$$\begin{aligned} E_q\{\log q(\theta )\}= & {} E_q\{\log q(\alpha )\} +E_q\{\log q(\lambda )\} \\&+E_q\{\log q(\sigma )\}+E_q\{\log q(\gamma )\}. \end{aligned}$$

The terms in the lower bound can be evaluated as follows:

$$\begin{aligned}&E_q\{\log p(y|\alpha ,\beta ,\lambda ,\gamma )\}=-\frac{n}{2}\log (2\pi )-\frac{n}{2}E_q(\log \gamma ^2) \\&\quad -\frac{1}{2}\big [y^Ty-2y^TE_q(Z)\mu _\alpha ^q+\text {tr}\{(\mu _\alpha ^q{\mu _\alpha ^q}^T+{\varSigma }_\alpha ^q)E_q(Z^TZ)\}\big ] \\&\quad \cdot {\mathcal {H}(n,C_\gamma ^q,A_\gamma ^2)}/{\mathcal {H}(n-2,C_\gamma ^q,A_\gamma ^2)}\\&E_q\{\log p(\alpha |\sigma )\}=-m\log (2\pi ) -mE_q\{\log \sigma ^2\} \\&\quad +m\log m -\frac{m}{2}\frac{\mathcal {H}(2m,C_\sigma ^q,A_\sigma ^2)}{\mathcal {H}(2m-2,C_\sigma ^q,A_\sigma ^2)}\{{\mu _\alpha ^q}^T\mu _\alpha ^q+\text {tr}({\varSigma }_\alpha ^q)\}\\&E_q\{\log p(\lambda )\}=-\frac{d}{2}\log (2\pi ) -\quad \frac{1}{2}\log |{\varSigma }_\lambda ^0| \\&\quad -\frac{1}{2}(\mu _\lambda ^q-\mu _\lambda ^0)^T{{\varSigma }_\lambda ^0}^{-1}(\mu _\lambda ^q-\mu _\lambda ^0)-\frac{1}{2}\text {tr}({{\varSigma }_\lambda ^0}^{-1}{\varSigma }_\lambda ^q)\\&E_q\{\log p(\sigma )\}=\log (2A_\sigma )-\log \pi -E_q\{\log (A_\sigma ^2+\sigma ^2)\}\\&E_q\{\log p(\gamma )\}=\log (2A_\gamma )-\log \pi -E_q\{\log (A_\gamma ^2+\gamma ^2)\}\\&E_q\{\log q(\alpha )\}=-m\log (2\pi )-\frac{1}{2}\log |{\varSigma }_\alpha ^q|-m\\&E_q\{\log q(\lambda )\}=-\frac{d}{2}\log (2\pi ) -\frac{1}{2}\log |{\varSigma }_\lambda ^q|-\frac{d}{2}\\&E_q\{\log q(\sigma )\}=-C_\sigma ^q \frac{\mathcal {H}(2m,C_\sigma ^q,A_\sigma ^2)}{\mathcal {H}(2m-2,C_\sigma ^q,A_\sigma ^2)} - 2mE_q\{\log \sigma \}\\&\quad -\log \mathcal {H}(2m-2,C_\sigma ^q,A_\sigma ^2) -E_q\{\log (A_\sigma ^2+\sigma ^2)\}\\&E_q\{\log q(\gamma )\}=-C_\gamma ^q {\mathcal {H}(n,C_\gamma ^q,A_\gamma ^2)} /{\mathcal {H}(n-2,C_\gamma ^q,A_\gamma ^2)} \\&\quad -\log \mathcal {H}(n-2,C_\gamma ^q,A_\gamma ^2)-nE_q\{\log \gamma \}-E_q\{\log (A_\gamma ^2+\gamma ^2)\} \end{aligned}$$

Putting these terms together and making use of the updates in steps 5 and 6 of Algorithm 1 gives the lower bound in (12).

Appendix 3: Derivation of simplified updates in Algorithm 2

It can be shown (see Wand 2014; Tan and Nott 2013) that the natural parameter of $q(\lambda )=N(\mu _\lambda ^q,{\varSigma }_\lambda ^q)$ is

$$\begin{aligned} \eta _{\lambda } = \left[ {\begin{array}{c}{- \frac{1}{2}D_{d}^{T} {\text {vec}}\left( {\mathop \sum \nolimits _{\lambda }^{{q^{{ - 1}} }} } \right) } \\ {\sum \nolimits _{\lambda }^{{q^{{ - 1}} }} {\mu _{\lambda }^{q} } } \\ \end{array} } \right] , \end{aligned}$$

where $D_d$ is a unique $d^2 \times \tfrac{d}{2}(d+1)$ matrix that transforms $\text {vech}(A)$ into $\text {vec}(A)$ for any $d \times d$ symmetric square matrix A, that is, $D_d\text {vech}(A)=\text {vec}(A)$. We use $\text {vech}(A)$ to denote the $\tfrac{1}{2}d(d+1) \times 1$ vector obtained from $\text {vec}(A)$ by eliminating all supradiagonal elements of A. Magnus and Neudecker (1988) is a good reference for the matrix differential calculus involved in the derivation below. From (13) and (Tan and Nott 2013, pg. 7), we have

$$\begin{aligned} \left[ {\begin{array}{c} { - \frac{1}{2}D_{d}^{T} {\text {vec}}\left( {\mathop \sum \nolimits _{\lambda }^{{q(t)^{{ - 1}} }} } \right) {\text { }}} \\ {\sum \nolimits _{\lambda }^{{q(t)^{{ - 1}} }} {\mu _{\lambda }^{{q(t)}} } {\text { }}} \\ \end{array} } \right] = (1 - a_{t} )\cdot \left[ {\,\begin{array}{c} { - \frac{1}{2}D_{d}^{T} {\text {vec}}\left( {\sum _{\lambda }^{{q(t - 1)^{{ - 1}} }} } \right) } \\ {\sum \nolimits _{\lambda }^{{q(t - 1)^{{ - 1}} }} {\mu _{\lambda }^{{q(t - 1)}} } } \\ \\ \end{array} } \right] \nonumber \\ + a_t \left[ {\begin{array}{cc} {D_{d}^{T} } &{} \quad 0 \\ { - 2(\mu _{\lambda }^{{q(t - 1)^{T} }} \otimes I)D_{d}^{{ + T}} D_{d}^{T} } &{} \quad I \\ \end{array} } \right] \sum \limits _{{a \in N(\lambda )}} {\left[ {\begin{array}{c} {\frac{{\partial S_{a} }}{{\partial {\text {vec}}(\sum _{\lambda }^{q} )}}} \\ {\frac{{\partial S_{a} }}{{\partial \mu _{\lambda }^{q} }}} \\ \end{array} } \right] }, \nonumber \\ \end{aligned}$$

(21)

where $\dfrac{\partial S_a}{\partial \text {vec}({\varSigma }_\lambda ^q)}$ and $\dfrac{\partial S_a}{\partial \mu _\lambda ^q}$ are evaluated at

$$\begin{aligned} {\varSigma }_\lambda ^q={{{\varSigma }_\lambda ^q}^{(t)}}^{-1} \;\; \text {and} \;\; \mu _\lambda ^q={\mu _\lambda ^q}^{(t-1)}. \end{aligned}$$

Let

$$\begin{aligned} \sum _{a \in N(\lambda )} \frac{\partial S_a}{\partial \text {vec}({\varSigma }_\lambda ^q)} = -\frac{1}{2}\text {vec}(G). \end{aligned}$$

The first line of (21) simplifies to

$$\begin{aligned}&{{{\varSigma }_\lambda ^q}^{(t)}}^{-1} = (1-a_t) {{{\varSigma }_\lambda ^q}^{(t)}}^{-1} + a_t G \\&\Rightarrow {{\varSigma }_\lambda ^q}^{(t)} =\{(1-a_t) {{{\varSigma }_\lambda ^q}^{(t)}}^{-1} + a_t G\}^{-1}. \end{aligned}$$

The second line of (21) gives

$$\begin{aligned} {{{\varSigma }_\lambda ^q}^{(t)}}^{-1} {\mu _\lambda ^q}^{(t)}= & {} (1-a_t) {{{\varSigma }_\lambda ^q}^{(t-1)}}^{-1} {\mu _\lambda ^q}^{(t-1)} \\&+ a_t G {\mu _\lambda ^q}^{(t-1)} + a_t \sum _{a \in N(\lambda )}\frac{\partial S_a}{\partial \mu _\lambda ^q} \\= & {} {{{\varSigma }_\lambda ^q}^{(t)}}^{-1} {\mu _\lambda ^q}^{(t-1)} + a_t \sum _{a \in N(\lambda )}\frac{\partial S_a}{\partial \mu _\lambda ^q} \\ \Rightarrow {\mu _\lambda ^q}^{(t)}= & {} {\mu _\lambda ^q}^{(t-1)} + a_t {{\varSigma }_\lambda ^q}^{(t)} \sum _{a \in N(\lambda )}\frac{\partial S_a}{\partial \mu _\lambda ^q}. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tan, L.S.L., Ong, V.M.H., Nott, D.J. et al. Variational inference for sparse spectrum Gaussian process regression. Stat Comput 26, 1243–1261 (2016). https://doi.org/10.1007/s11222-015-9600-7

Download citation

Received: 08 June 2014
Accepted: 02 September 2015
Published: 12 September 2015
Issue Date: November 2016
DOI: https://doi.org/10.1007/s11222-015-9600-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variational inference for sparse spectrum Gaussian process regression

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Sparse Information Filter for Fast Gaussian Process Regression

Hilbert space methods for reduced-rank Gaussian process regression

Sparse Spectrum Gaussian Process for Bayesian Optimization

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Derivation of \(E_q(Z)\) and \(E_q(Z^TZ)\)

Lemma 1

Proof

Appendix 2: Derivation of lower bound

Appendix 3: Derivation of simplified updates in Algorithm 2

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Variational inference for sparse spectrum Gaussian process regression

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Sparse Information Filter for Fast Gaussian Process Regression

Hilbert space methods for reduced-rank Gaussian process regression

Sparse Spectrum Gaussian Process for Bayesian Optimization

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Derivation of \(E_q(Z)\) and \(E_q(Z^TZ)\)

Lemma 1

Proof

Appendix 2: Derivation of lower bound

Appendix 3: Derivation of simplified updates in Algorithm 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation