Rapid Convergence of the Unadjusted Langevin Algorithm: Isoperimetry Suffices

  • Chapter
  • First Online:
Geometric Aspects of Functional Analysis

Part of the book series: Lecture Notes in Mathematics ((LNM,volume 2327))

  • 456 Accesses

Abstract

We study the Unadjusted Langevin Algorithm (ULA) for sampling from a probability distribution \(\nu = e^{-f}\) on \(\mathbb {R}^n\). We prove a convergence guarantee in Kullback-Leibler (KL) divergence assuming \(\nu \) satisfies a log-Sobolev inequality and the Hessian of f is bounded. Notably, we do not assume convexity or bounds on higher derivatives. We prove convergence guarantees in Rényi divergence of order \(q > 1\) assuming the limit of ULA satisfies isoperimetry, namely either the log-Sobolev or Poincaré inequality. We also prove a bound on the bias of the limiting distribution of ULA assuming third-order smoothness of f, without requiring isoperimetry.

This work was supported in part by NSF awards CCF-1717349, DMS-1839323, CCF-2007443, and CCF-2106644.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 53.49
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 69.54
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Recall for \(\nu = \mathcal {N}(0,\frac {1}{\alpha } I)\) and \(\rho = \mathcal {N}(0,\frac {1}{\beta } I)\) on \(\mathbb {R}^n\), the relative Fisher information is \(J_\nu (\rho ) = \frac {n}{\beta } (\beta -\alpha )^2\).

  2. 2.

    Recall \(W_1(\rho ,\nu ) = \sup \{ \mathbb {E}_\rho [g] - \mathbb {E}_\nu [g] \colon g \text{ is }1\text{-Lipschitz}\}\).

References

  1. M. Abadi, A. Chu, I. Goodfellow, H.B. McMahan, I. Mironov, K. Talwar, L. Zhang, Deep learning with differential privacy, in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (ACM, 2016), pp. 308–318

    Google Scholar 

  2. D. Applegate, R. Kannan, Sampling and integration of near log-concave functions, in Proceedings of the Twenty-third Annual ACM Symposium on Theory of Computing, STOC ’91, New York, NY, USA (ACM, 1991), pp. 156–163

    Book  Google Scholar 

  3. J.C. Baez, Rényi entropy and free energy. Preprint. ar**v:1102.2098 (2011)

    Google Scholar 

  4. S. Bai, T. Lepoint, A. Roux-Langlois, A. Sakzad, D. Stehlé, R. Steinfeld, Improved security proofs in lattice-based cryptography: using the Rényi divergence rather than the statistical distance. J. Cryptol. 31(2), 610–640 (2018)

    Article  MATH  Google Scholar 

  5. D. Bakry, M. Émery, Diffusions hypercontractives, in Séminaire de Probabilités XIX 1983/84 (Springer, 1985), pp. 177–206

    Google Scholar 

  6. D. Bakry, F. Barthe, P. Cattiaux, A. Guillin et al., A simple proof of the Poincaré inequality for a large class of probability measures. Electron. Commun. Probab. 13, 60–66 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  7. K. Balasubramanian, S. Chewi, M.A. Erdogdu, A. Salim, M. Zhang, Towards a theory of non-log-concave sampling: first-order stationarity guarantees for Langevin Monte Carlo, in Proceedings of the 2022 Conference on Learning Theory. PMLR (2022)

    Google Scholar 

  8. J.B. Bardet, N. Gozlan, F. Malrieu, P.A. Zitt, Functional inequalities for Gaussian convolutions of compactly supported measures: explicit bounds and dimension dependence. Bernoulli 24(1), 333–353 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  9. E. Bernton, Langevin Monte Carlo and JKO splitting, in Conference on Learning Theory, COLT 2018, Stockholm, Sweden, 6-9 July 2018 (2018), pp. 1777–1798

    Google Scholar 

  10. S.G. Bobkov, F. Götze, Exponential integrability and transportation cost related to logarithmic Sobolev inequalities. J. Funct. Anal. 163(1), 1–28 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  11. S.G. Bobkov, I. Gentil, M. Ledoux, Hypercontractivity of Hamilton–Jacobi equations. J. Math. Pures Appl. 80(7), 669–696 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  12. S.G. Bobkov, G.P. Chistyakov, F. Götze, Rényi divergence and the central limit theorem. Ann. Probab. 47(1), 270–323 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  13. M. Bun, T. Steinke, Concentrated differential privacy: simplifications, extensions, and lower bounds, in Theory of Cryptography Conference (Springer, 2016), pp. 635–658

    Google Scholar 

  14. Y. Cao, J. Lu, Y. Lu, Exponential decay of Rényi divergence under Fokker–Planck equations. J. Stat. Phys. 176, 1172–1184 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  15. D. Chafaï, Entropies, convexity, and functional inequalities: on \(\phi \)-entropies and \(\phi \)-Sobolev inequalities. J. Math. Kyoto Univ. 44(2), 325–363 (2004)

    Google Scholar 

  16. D. Chafai, F. Malrieu, On fine properties of mixtures with respect to concentration of measure and Sobolev type inequalities, in Annales de l’IHP Probabilités et statistiques, vol. 46 (2010), pp. 72–96

    Google Scholar 

  17. Z. Chen, S.S. Vempala, Optimal convergence rate of Hamiltonian Monte Carlo for strongly logconcave distributions. Theory Comput. 18(9), 1–18 (2022)

    MathSciNet  MATH  Google Scholar 

  18. Y. Chen, S. Chewi, A. Salim, A. Wibisono, Improved analysis for a proximal algorithm for sampling, in Proceedings of the 2022 Conference on Learning Theory. PMLR (2022)

    Google Scholar 

  19. X. Cheng, P. Bartlett, Convergence of Langevin MCMC in KL-divergence, in F. Janoos, M. Mohri, K. Sridharan, ed. by Proceedings of Algorithmic Learning Theory, volume 83 of Proceedings of Machine Learning Research. PMLR, 07–09 Apr (2018), pp. 186–211

    Google Scholar 

  20. X. Cheng, N.S. Chatterji, Y. Abbasi-Yadkori, P.L. Bartlett, M.I. Jordan, Sharp convergence rates for Langevin dynamics in the nonconvex setting. Preprint. ar**v:1805.01648 (2018)

    Google Scholar 

  21. S. Chewi, T. Le Gouic, C. Lu, T. Maunu, P. Rigollet, A. Stromme, Exponential ergodicity of mirror-Langevin diffusions, in Advances in Neural Information Processing Systems, vol. 33 (2020), pp. 19573–19585

    Google Scholar 

  22. S. Chewi, M.A. Erdogdu, M.B. Li, R. Shen, M. Zhang, Analysis of Langevin Monte Carlo from Poincaré to log-Sobolev, in Proceedings of the 2022 Conference on Learning Theory. PMLR (2022)

    Google Scholar 

  23. T.A. Courtade, Bounds on the Poincaré constant for convolution measures. Ann. l’Inst. Henri Poincaré Probab. Stat. 56(1), 566–579 (2020)

    MATH  Google Scholar 

  24. I. Csiszár, Generalized cutoff rates and Rényi’s information measures. IEEE Trans. Inf. Theory 41(1), 26–34 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  25. A. Dalalyan, Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent, in Proceedings of the 2017 Conference on Learning Theory, volume 65 of Proceedings of Machine Learning Research. PMLR, 07–10 Jul (2017), pp. 678–689

    Google Scholar 

  26. A.S. Dalalyan, A. Karagulyan, User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient, in Stochastic Processes and their Applications (2019)

    Google Scholar 

  27. X. Dong, The gravity dual of Rényi entropy. Nat. Commun. 7, 12472 (2016)

    Article  Google Scholar 

  28. A. Durmus, E. Moulines, E. Saksman, On the convergence of Hamiltonian Monte Carlo. Preprint. ar**v:1705.00166 (2017)

    Google Scholar 

  29. A. Durmus, S. Majewski, B. Miasojedow, Analysis of Langevin Monte Carlo via convex optimization. J. Mach. Learn. Res. 20(1), 2666–2711 (2019)

    MathSciNet  MATH  Google Scholar 

  30. R. Dwivedi, Y. Chen, M.J. Wainwright, B. Yu, Log-concave sampling: Metropolis-Hastings algorithms are fast!, in Conference on Learning Theory, COLT 2018, Stockholm, Sweden, 6–9 July (2018), pp. 793–797

    Google Scholar 

  31. C. Dwork, G.N. Rothblum, Concentrated differential privacy. Preprint. ar**v:1603.01887 (2016)

    Google Scholar 

  32. A. Eberle, A. Guillin, R. Zimmer, Couplings and quantitative contraction rates for Langevin dynamics. Ann. Probab. 47(4), 1982–2010 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  33. M.A. Erdogdu, R. Hosseinzadeh, On the convergence of Langevin Monte Carlo: the interplay between tail growth and smoothness, in Proceedings of Thirty Fourth Conference on Learning Theory, ed. by M. Belkin, S. Kpotufe, volume 134 of Proceedings of Machine Learning Research. PMLR, 15–19 Aug (2021), pp. 1776–1822

    Google Scholar 

  34. M.A. Erdogdu, R. Hosseinzadeh, M.S. Zhang, Convergence of Langevin Monte Carlo in chi-squared and Rényi divergence, in International Conference on Artificial Intelligence and Statistics. PMLR (2022), pp. 8151–8175

    Google Scholar 

  35. A. Ganesh, K. Talwar, Faster differentially private samplers via Rényi divergence analysis of discretized Langevin MCMC, in Advances in Neural Information Processing Systems, ed. by H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, H. Lin, vol. 33 (Curran Associates, 2020), pp. 7222–7233

    Google Scholar 

  36. A. Garbuno-Inigo, N. Nüsken, S. Reich, Affine invariant interacting Langevin dynamics for Bayesian inference. SIAM J. Appl. Dynam. Syst. 19(3), 1633–1658 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  37. K. Gatmiry, S.S. Vempala, Convergence of the Riemannian Langevin algorithm. Preprint. ar**v:2204.10818 (2022)

    Google Scholar 

  38. J. Gorham, L. Mackey, Measuring sample quality with kernels, in Proceedings of the 34th International Conference on Machine Learning, ed. by D. Precup, Y.W. Teh, volume 70 of Proceedings of Machine Learning Research, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR (2017), pp. 1292–1301

    Google Scholar 

  39. L. Gross, Logarithmic Sobolev inequalities. Am. J. Math. 97(4), 1061–1083 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  40. P. Harremoës, Interpretations of Rényi entropies and divergences. Physica A Stat. Mech. Appl. 365(1), 57–62 (2006)

    Article  MathSciNet  Google Scholar 

  41. Y. He, A.B. Hamza, H. Krim, A generalized divergence measure for robust image registration. IEEE Trans. Signal Process. 51(5), 1211–1220 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  42. Y. He, K. Balasubramanian, M.A. Erdogdu, Heavy-tailed sampling via transformed unadjusted Langevin algorithm. Preprint. ar**v:2201.08349 (2022)

    Google Scholar 

  43. R. Holley, D. Stroock, Logarithmic Sobolev inequalities and stochastic Ising models. J. Stat. Phys. 46(5), 1159–1194 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  44. R. Holley, D. Stroock, Simulated annealing via Sobolev inequalities. Commun. Math. Phys. 115(4), 553–569 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  45. M. Iwamoto, J. Shikata, Information theoretic security for encryption based on conditional Rényi entropies, in International Conference on Information Theoretic Security (Springer, 2013), pp. 103–121

    Google Scholar 

  46. Q. Jiang, Mirror Langevin Monte Carlo: the case under isoperimetry, in Advances in Neural Information Processing Systems, ed. by M. Ranzato, A. Beygelzimer, K. Nguyen, P.S. Liang, J.W. Vaughan, Y. Dauphin, vol. 34 (Curran Associates, 2021)

    Google Scholar 

  47. R. Jordan, D. Kinderlehrer, F. Otto, The variational formulation of the Fokker–Planck equation. SIAM J. Math. Anal. 29(1), 1–17 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  48. R. Kannan, L. Lovász, M. Simonovits, Random walks and an \(O^*(n^5)\) volume algorithm for convex bodies. Random Struct. Algorithms 11, 1–50 (1997)

    Google Scholar 

  49. M. Ledoux, Concentration of measure and logarithmic Sobolev inequalities. Sémin. Probab. Strasbourg 33, 120–216 (1999)

    MathSciNet  MATH  Google Scholar 

  50. Y.T. Lee, S.S. Vempala, Convergence rate of Riemannian Hamiltonian Monte Carlo and faster polytope volume computation, in Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing (ACM, 2018), pp. 1115–1121

    Google Scholar 

  51. M.B. Li, M.A. Erdogdu, Riemannian Langevin algorithm for solving semidefinite programs. Preprint. ar**v:2010.11176 (2020)

    Google Scholar 

  52. Y. Li, R.E. Turner, Rényi divergence variational inference, in Advances in Neural Information Processing Systems, vol. 29, ed. by D.D. Lee, M. Sugiyama, U.V. Luxburg, I. Guyon, R. Garnett (Curran Associates, 2016), pp. 1073–1081

    Google Scholar 

  53. X. Li, Y. Wu, L. Mackey, M.A. Erdogdu, Stochastic Runge–Kutta accelerates Langevin Monte Carlo and beyond, in Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  54. L. Lovász, S. Vempala, Fast algorithms for logconcave functions: Sampling, rounding, integration and optimization, in FOCS (2006), pp. 57–68

    Google Scholar 

  55. L. Lovász, S.S. Vempala, Hit-and-run from a corner. SIAM J. Comput. 35(4), 985–1005 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  56. L. Lovász, S. Vempala, The geometry of logconcave functions and sampling algorithms. Random Struct. Algorithms 30(3), 307–358 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  57. Y.A. Ma, N.S. Chatterji, X. Cheng, N. Flammarion, P.L. Bartlett, M.I. Jordan, Is there an analog of Nesterov acceleration for gradient-based MCMC? Bernoulli 27(3), 1942–1992 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  58. M.C. Mackey, Time’s Arrow: The Origins of Thermodynamics Behavior (Springer, 1992)

    Google Scholar 

  59. O. Mangoubi, A. Smith, Rapid mixing of Hamiltonian Monte Carlo on strongly log-concave distributions. Preprint. ar**v:1708.07114 (2017)

    Google Scholar 

  60. O. Mangoubi, N. Vishnoi, Dimensionally tight bounds for second-order Hamiltonian Monte Carlo, in Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, 2018), pp. 6027–6037

    Google Scholar 

  61. O. Mangoubi, N.K. Vishnoi, Nonconvex sampling with the Metropolis-adjusted Langevin algorithm, in Conference on Learning Theory. PMLR (2019), pp. 2259–2293

    Google Scholar 

  62. Y. Mansour, M. Mohri, A. Rostamizadeh, Multiple source adaptation and the Rényi divergence, in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (AUAI Press, 2009), pp. 367–374

    Google Scholar 

  63. G. Menz, A. Schlichting, Poincaré and logarithmic Sobolev inequalities by decomposition of the energy landscape. Ann. Probab. 42(5), 1809–1884 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  64. I. Mironov, Rényi differential privacy, in 2017 IEEE 30th Computer Security Foundations Symposium (CSF) (IEEE, 2017), pp. 263–275

    Google Scholar 

  65. D. Morales, L. Pardo, I. Vajda, Rényi statistics in directed families of exponential experiments. Stat. J. Theor. Appl. Stat. 34(2), 151–174 (2000)

    MATH  Google Scholar 

  66. F. Otto, C. Villani, Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality. J. Funct. Anal. 173(2), 361–400 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  67. M. Raginsky, A. Rakhlin, M. Telgarsky, Non-convex learning via stochastic gradient Langevin dynamics: a nonasymptotic analysis, in Proceedings of the 2017 Conference on Learning Theory, ed. by S. Kale, O. Shamir, volume 65 of Proceedings of Machine Learning Research, Amsterdam, Netherlands, 07–10 Jul 2017. PMLR (2017), pp. 1674–1703

    Google Scholar 

  68. A. Rényi et al., On measures of entropy and information, in Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics (The Regents of the University of California, 1961)

    Google Scholar 

  69. O.S. Rothaus, Diffusion on compact Riemannian manifolds and logarithmic Sobolev inequalities. J. Funct. Anal. 42(1), 102–109 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  70. M. Talagrand, Transportation cost for Gaussian and other product measures. Geom. Funct. Anal. 6, 587–600 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  71. T. Van Erven, P. Harremos, Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inf. Theory 60(7), 3797–3820 (2014)

    Article  MATH  Google Scholar 

  72. S. Vempala, A. Wibisono, Rapid convergence of the unadjusted Langevin algorithm: isoperimetry suffices, in Advances in Neural Information Processing Systems, vol. 32 (Curran Associates, 2019)

    Google Scholar 

  73. C. Villani, A short proof of the concavity of entropy power. IEEE Trans. Inf. Theory 46(4), 1695–1696 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  74. C. Villani, Topics in Optimal Transportation. Number 58 in Graduate Studies in Mathematics (American Mathematical Society, 2003)

    Google Scholar 

  75. F.Y. Wang, J. Wang, Functional inequalities for convolution of probability measures, in Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, vol. 52 (Institut Henri Poincaré, 2016), pp. 898–914

    Google Scholar 

  76. X. Wang, Q. Lei, I. Panageas, Fast convergence of Langevin dynamics on manifold: Geodesics meet log-Sobolev, in Advances in Neural Information Processing Systems, ed. by H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, H. Lin, vol. 33 (Curran Associates, 2020), pp. 18894–18904

    Google Scholar 

  77. A. Wibisono, Sampling as optimization in the space of measures: the Langevin dynamics as a composite optimization problem, in Conference on Learning Theory, COLT 2018, Stockholm, Sweden, 6–9 July 2018 (2018), pp. 2093–3027

    Google Scholar 

  78. A. Wibisono, Proximal Langevin algorithm: rapid convergence under isoperimetry. e-prints ar**v:1911.01469 (2019)

    Google Scholar 

Download references

Acknowledgements

The authors thank Kunal Talwar for explaining the privacy motivation and application of Rényi divergence to data privacy; Yu Cao, Jianfeng Lu, and Yulong Lu for alerting us to their work [14] on Rényi divergence; **ang Cheng and Peter Bartlett for helpful comments on an earlier version of this paper; and Sinho Chewi for communicating Theorem 8 to us.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Santosh S. Vempala .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Review on Notation and Basic Properties

Throughout, we represent a probability distribution \(\rho \) on \(\mathbb {R}^n\) via its probability density function with respect to the Lebesgue measure, so \(\rho \colon \mathbb {R}^n \to \mathbb {R}\) with \(\int _{\mathbb {R}^n} \rho (x) dx = 1\). We typically assume \(\rho \) has full support and smooth density, so \(\rho (x) > 0\) and \(x \mapsto \rho (x)\) is differentiable. Given a function \(f \colon \mathbb {R}^n \to \mathbb {R}\), we denote the expected value of f under \(\rho \) by

$$\displaystyle \begin{aligned} \mathbb{E}_\rho[f] = \int_{\mathbb{R}^n} f(x) \rho(x) \, dx. \end{aligned} $$

We use the Euclidean inner product \(\langle x,y \rangle = \sum _{i=1}^n x_i y_i\) for \(x = (x_i)_{1 \le i \le n}, y = (y_i)_{1 \le i \le n} \in \mathbb {R}^n\). For symmetric matrices \(A, B \in \mathbb {R}^{n \times n}\), let \(A \preceq B\) denote that \(B-A\) is positive semidefinite. For \(\mu \in \mathbb {R}^n\), \(\Sigma \succ 0\), let \(\mathcal {N}(\mu ,\Sigma )\) denote the Gaussian distribution on \(\mathbb {R}^n\) with mean \(\mu \) and covariance matrix \(\Sigma \).

Given a smooth function \(f \colon \mathbb {R}^n \to \mathbb {R}\), its gradient\(\nabla f \colon \mathbb {R}^n \to \mathbb {R}^n\) is the vector of partial derivatives:

$$\displaystyle \begin{aligned} \nabla f(x) = \left(\frac{\partial f(x)}{\partial x_1}, \dots, \frac{\partial f(x)}{\partial x_n} \right). \end{aligned} $$

The Hessian\(\nabla ^2 f \colon \mathbb {R}^n \to \mathbb {R}^{n \times n}\) is the matrix of second partial derivatives:

$$\displaystyle \begin{aligned} \nabla^2 f(x) = \left(\frac{\partial ^2f(x)}{\partial x_i x_j} \right)_{1 \le i,j \le n}. \end{aligned} $$

The Laplacian\(\Delta f \colon \mathbb {R}^n \to \mathbb {R}\) is the trace of its Hessian:

$$\displaystyle \begin{aligned} \Delta f(x) = \mathrm{Tr}(\nabla^2 f(x)) = \sum_{i=1}^n \frac{\partial ^2 f(x)}{\partial x_i^2}. \end{aligned} $$

Given a smooth vector field \(v = (v_1,\dots ,v_n) \colon \mathbb {R}^n \to \mathbb {R}^n\), its divergence\(\nabla \cdot v \colon \mathbb {R}^n \to \mathbb {R}\) is

$$\displaystyle \begin{aligned} (\nabla \cdot v)(x) = \sum_{i=1}^n \frac{\partial v_i(x)}{\partial x_i}. \end{aligned} $$

In particular, the divergence of gradient is the Laplacian:

$$\displaystyle \begin{aligned} (\nabla \cdot \nabla f)(x) = \sum_{i=1}^n \frac{\partial ^2 f(x)}{\partial x_i^2} = \Delta f(x). \end{aligned} $$

For any function \(f \colon \mathbb {R}^n \to \mathbb {R}\) and vector field \(v \colon \mathbb {R}^n \to \mathbb {R}^n\) with sufficiently fast decay at infinity, we have the following integration by parts formula:

$$\displaystyle \begin{aligned} \int_{\mathbb{R}^n} \langle v(x), \nabla f(x) \rangle dx = -\int_{\mathbb{R}^n} f(x) (\nabla \cdot v)(x) dx. \end{aligned} $$

Furthermore, for any two functions \(f, g \colon \mathbb {R}^n \to \mathbb {R}\),

$$\displaystyle \begin{aligned} \int_{\mathbb{R}^n} f(x) \Delta g(x) dx = -\int_{\mathbb{R}^n} \langle \nabla f(x), \nabla g(x) \rangle dx = \int_{\mathbb{R}^n} g(x) \Delta f(x) dx. \end{aligned} $$

When the argument is clear, we omit the argument \((x)\) in the formulae for brevity. For example, the last integral above becomes

$$\displaystyle \begin{aligned} {} \int f \, \Delta g \, dx = -\int \langle \nabla f, \nabla g \rangle \, dx = \int g \, \Delta f \, dx. \end{aligned} $$
(B.1)

Derivation of the Fokker-Planck Equation

Consider a stochastic differential equation

$$\displaystyle \begin{aligned} {} dX_t = v(X_t) \, dt + \sqrt{2} \, dW_t \end{aligned} $$
(B.2)

where \(v \colon \mathbb {R}^n \to \mathbb {R}^n\) is a smooth vector field and \((W_t)_{t \ge 0}\) is the Brownian motion on \(\mathbb {R}^n\) with \(W_0 = 0\).

We will show that if \(X_t\) evolves following (B.2), then its probability density function \(\rho _t(x)\) evolves following the Fokker-Planck equation:

$$\displaystyle \begin{aligned} {} \frac{\partial \rho_t}{\partial t} = -\nabla \cdot (\rho_t v) + \Delta \rho_t. \end{aligned} $$
(B.3)

We can derive this heuristically as follows; we refer to standard textbooks for rigorous derivation [58].

For any smooth test function \(\phi \colon \mathbb {R}^n \to \mathbb {R}\), let us compute the time derivative of the expectation

$$\displaystyle \begin{aligned} A(t) = \mathbb{E}_{\rho_t}[\phi] = \mathbb{E}[\phi(X_t)]. \end{aligned} $$

On the one hand, we can compute this as

$$\displaystyle \begin{aligned} {} \dot A(t) = \frac{d}{dt} A(t) = \frac{d}{dt} \int_{\mathbb{R}^n} \rho_t(x) \phi(x) \, dx = \int_{\mathbb{R}^n} \frac{\partial \rho_t(x)}{\partial t} \phi(x) \, dx. \end{aligned} $$
(B.4)

On the other hand, by (B.2), for small \(\epsilon > 0\) we have

$$\displaystyle \begin{aligned} X_{t+\epsilon } &= X_t + \int_t^{t+\epsilon } v(X_s) ds + \sqrt{2} (W_{t+\epsilon }-W_t) \\ &= X_t + \epsilon v(X_t) + \sqrt{2} (W_{t+\epsilon }-W_t) + O(\epsilon ^2) \\ &\stackrel{d}{=} X_t + \epsilon v(X_t) + \sqrt{2\epsilon } Z + O(\epsilon ^2) \end{aligned} $$

where \(Z \sim \mathcal {N}(0,I)\) is independent of \(X_t\), since \(W_{t+\epsilon }-W_t \sim \mathcal {N}(0,\epsilon I)\). Then by Taylor expansion,

$$\displaystyle \begin{aligned} \phi(X_{t+\epsilon }) &\stackrel{d}{=} \phi\left(X_t + \epsilon v(X_t) + \sqrt{2\epsilon } Z + O(\epsilon ^2)\right) \\ &= \phi(X_t) + \epsilon \langle \nabla \phi(X_t), v(X_t) \rangle + \sqrt{2\epsilon } \langle \nabla \phi(X_t), Z \rangle \\ &\quad + \frac{1}{2} 2\epsilon \langle Z, \nabla^2 \phi(X_t) Z \rangle + O(\epsilon ^{\frac{3}{2}}). \end{aligned} $$

Now we take expectation on both sides. Since \(Z \sim \mathcal {N}(0,I)\) is independent of \(X_t\),

$$\displaystyle \begin{aligned} A(t+\epsilon ) &= \mathbb{E}[\phi(X_{t+\epsilon })]\\ &= \mathbb{E}\Big[\phi(X_t) + \epsilon \langle \nabla \phi(X_t), v(X_t) \rangle + \sqrt{2\epsilon } \langle \nabla \phi(X_t), Z \rangle \\ &\qquad + \epsilon \langle Z, \nabla^2 \phi(X_t) Z \rangle \Big] + O(\epsilon ^{\frac{3}{2}}) \\ &= A(t)+ \epsilon \left(\mathbb{E}[\langle \nabla \phi(X_t), v(X_t) \rangle] + \mathbb{E}[\Delta \phi(X_t)]\right) + O(\epsilon ^{\frac{3}{2}}). \end{aligned} $$

Therefore, by integration by parts, this second approach gives

$$\displaystyle \begin{aligned} \dot A(t) &= \lim_{ \epsilon \to 0} \frac{A(t+\epsilon )-A(t)}{\epsilon } \notag \\ &= \mathbb{E}[\langle \nabla \phi(X_t), v(X_t) \rangle] + \mathbb{E}[\Delta \phi(X_t)] \notag \\ &= \int_{\mathbb{R}^n} \langle \nabla \phi(x), \rho_t(x) v(x) \rangle dx + \int_{\mathbb{R}^n} \rho_t(x) \Delta \phi(x) \, dx \notag \\ &= -\int_{\mathbb{R}^n} \phi(x) \nabla \cdot (\rho_t v)(x) \, dx + \int_{\mathbb{R}^n} \phi(x) \Delta \rho_t(x) \, dx \notag \\ &= \int_{\mathbb{R}^n} \phi(x) \left(-\nabla \cdot (\rho_t v)(x) + \Delta \rho_t(x)\right) \, dx. {} \end{aligned} $$
(B.5)

Comparing (B.4) and (B.5), and since \(\phi \) is arbitrary, we conclude that

$$\displaystyle \begin{aligned} \frac{\partial \rho_t(x)}{\partial t} = -\nabla \cdot (\rho_t v)(x) + \Delta \rho_t(x) \end{aligned} $$

as claimed in (B.3).

When \(v = -\nabla f\), the stochastic differential equation (B.2) becomes the Langevin dynamics (7) from Sect. 2.3, and the Fokker-Planck equation (B.3) becomes (8).

In the proof of Lemma 3, we also apply the Fokker-Planck equation (B.3) when \(v = -\nabla f(x_0)\) is a constant vector field to derive the evolution equation (30) for one step of ULA.

1.2 Remaining Proofs

1.2.1 Proof of Lemma 16

Proof of Lemma 16

Let \(g \colon \mathbb {R}^n \to \mathbb {R}\) be a smooth function, and let \(\tilde g \colon \mathbb {R}^n \to \mathbb {R}\) be the function \(\tilde g(x) = g(T(x))\). Let \(X \sim \nu \), so \(T(X) \sim \tilde \nu \). Note that

$$\displaystyle \begin{aligned} \mathbb{E}_{\tilde \nu}[g^2] &= \mathbb{E}_{X \sim \nu}[g(T(X))^2] = \mathbb{E}_{\nu}[\tilde g^2], \\ \mathbb{E}_{\tilde \nu}[g^2 \log g^2] &= \mathbb{E}_{X \sim \nu}[g(T(X))^2 \log g(T(X))^2] = \mathbb{E}_{\nu}[\tilde g^2 \log \tilde g^2]. \end{aligned} $$

Furthermore, we have \(\nabla \tilde g(x) = \nabla T(x) \, \nabla g(T(x))\). Since T is L-Lipschitz, \(\|\nabla T(x)\| \le L\). Then

$$\displaystyle \begin{aligned} \|\nabla \tilde g(x)\| \le \|\nabla T(x)\| \, \|\nabla g(T(x))\| \le L \|\nabla g(T(x))\|. \end{aligned} $$

This implies

$$\displaystyle \begin{aligned} \mathbb{E}_{\tilde \nu}[\|\nabla g\|{}^2] = \mathbb{E}_{X \sim \nu}[\|\nabla g(T(X))\|{}^2] \ge \frac{\mathbb{E}_{\nu}[\|\nabla \tilde g\|{}^2]}{L^2}. \end{aligned} $$

Therefore,

$$\displaystyle \begin{aligned} \frac{\mathbb{E}_{\tilde \nu}[\|\nabla g\|{}^2]}{\mathbb{E}_{\tilde \nu}[g^2 \log g^2] \,{-}\, \mathbb{E}_{\tilde \nu}[g^2] \log \mathbb{E}_{\tilde \nu}[g^2]} &\,{\ge}\, \frac{1}{L^2} \, \frac{\mathbb{E}_{\nu}[\|\nabla \tilde g\|{}^2]}{\big(\mathbb{E}_{\nu}[\tilde g^2 \log \tilde g^2] \,{-}\, \mathbb{E}_{\nu}[\tilde g^2] \log \mathbb{E}_{\nu}[\tilde g^2]\big)}\\ &\,{\ge}\, \frac{\alpha}{2L^2} \end{aligned} $$

where the last inequality follows from the assumption that \(\nu \) satisfies LSI with constant \(\alpha \). This shows that \(\tilde \nu \) satisfies LSI with constant \(\alpha /L^2\), as desired. □

Proof of Lemma 17

Proof of Lemma 17

We recall the following convolution property of LSI [15]: If \(\nu , \tilde \nu \) satisfy LSI with constants \(\alpha , \tilde \alpha > 0\), respectively, then \(\nu \ast \tilde \nu \) satisfies LSI with constant \(\left (\frac {1}{\alpha }+\frac {1}{\tilde \alpha }\right )^{-1}\). Since \(\mathcal {N}(0,2tI)\) satisfies LSI with constant \(\frac {1}{2t}\), the claim follows. □

Proof of Lemma 19

Proof of Lemma 19

Let \(g \colon \mathbb {R}^n \to \mathbb {R}\) be a smooth function, and let \(\tilde g \colon \mathbb {R}^n \to \mathbb {R}\) be the function \(\tilde g(x) = g(T(x))\). Let \(X \sim \nu \), so \(T(X) \sim \tilde \nu \). Note that

$$\displaystyle \begin{aligned} \mathrm{Var}_{\tilde \nu}(g) &= \mathrm{Var}_{X \sim \nu}(g(T(X))) = \mathrm{Var}_{\nu}(\tilde g). \end{aligned} $$

Furthermore, we have \(\nabla \tilde g(x) = \nabla T(x) \, \nabla g(T(x))\). Since T is L-Lipschitz, \(\|\nabla T(x)\| \le L\). Then

$$\displaystyle \begin{aligned} \|\nabla \tilde g(x)\| \le \|\nabla T(x)\| \, \|\nabla g(T(x))\| \le L \|\nabla g(T(x))\|. \end{aligned} $$

This implies

$$\displaystyle \begin{aligned} \mathbb{E}_{\tilde \nu}[\|\nabla g\|{}^2] = \mathbb{E}_{X \sim \nu}[\|\nabla g(T(X))\|{}^2] \ge \frac{\mathbb{E}_{\nu}[\|\nabla \tilde g\|{}^2]}{L^2}. \end{aligned} $$

Therefore,

$$\displaystyle \begin{aligned} \frac{\mathbb{E}_{\tilde \nu}[\|\nabla g\|{}^2]}{\mathrm{Var}_{\tilde \nu}(g)} &\ge \frac{1}{L^2} \, \frac{\mathbb{E}_{\nu}[\|\nabla \tilde g\|{}^2]}{\mathrm{Var}_{\nu}(\tilde g)} \ge \frac{\alpha}{L^2} \end{aligned} $$

where the last inequality follows from the assumption that \(\nu \) satisfies Poincaré inequality with constant \(\alpha \). This shows that \(\tilde \nu \) satisfies Poincaré inequality with constant \(\alpha /L^2\), as desired. □

Proof of Lemma 20

Proof of Lemma 20

We recall the following convolution property of Poincaré inequality [23]: If \(\nu , \tilde \nu \) satisfy Poincaré inequality with constants \(\alpha , \tilde \alpha > 0\), respectively, then \(\nu \ast \tilde \nu \) satisfies Poincaré inequality with constant \(\left (\frac {1}{\alpha }+\frac {1}{\tilde \alpha }\right )^{-1}\). Since \(\mathcal {N}(0,2tI)\) satisfies Poincaré inequality with constant \(\frac {1}{2t}\), the claim follows. □

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Vempala, S.S., Wibisono, A. (2023). Rapid Convergence of the Unadjusted Langevin Algorithm: Isoperimetry Suffices. In: Eldan, R., Klartag, B., Litvak, A., Milman, E. (eds) Geometric Aspects of Functional Analysis. Lecture Notes in Mathematics, vol 2327. Springer, Cham. https://doi.org/10.1007/978-3-031-26300-2_15

Download citation

Publish with us

Policies and ethics

Navigation