Log in

Adaptive smoothing algorithms for nonsmooth composite convex minimization

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

We propose an adaptive smoothing algorithm based on Nesterov’s smoothing technique in Nesterov (Math Prog 103(1):127–152, 2005) for solving “fully” nonsmooth composite convex optimization problems. Our method combines both Nesterov’s accelerated proximal gradient scheme and a new homotopy strategy for smoothness parameter. By an appropriate choice of smoothing functions, we develop a new algorithm that has the \(\mathcal {O}\left( \frac{1}{\varepsilon }\right) \)-worst-case iteration-complexity while preserves the same complexity-per-iteration as in Nesterov’s method and allows one to automatically update the smoothness parameter at each iteration. Then, we customize our algorithm to solve four special cases that cover various applications. We also specify our algorithm to solve constrained convex optimization problems and show its convergence guarantee on a primal sequence of iterates. We demonstrate our algorithm through three numerical examples and compare it with other related algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Andreas, A., Marco, S., Suykens, J.: Hybrid conditional gradient-smoothing algorithms with applications to sparse and low rank regularization. In: Andreas, A., Marco, S., Suykens, J. (eds.) Regularization, Optimization, Kernels, and Support Vector Machines, pp. 53–82. CRC Press, Boca Raton (2014)

    Google Scholar 

  2. Baes, M., and Bürgisser, M.: Smoothing techniques for solving semi-definite programs with many constraints. Optimization Online (2009)

  3. Bauschke, H.H., Combettes, P.: Convex Analysis and Monotone Operators Theory in Hilbert Spaces. Springer, New York (2011)

    Book  MATH  Google Scholar 

  4. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding agorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  5. Beck, A., Teboulle, M.: Smoothing and first order methods: a unified framework. SIAM J. Optim. 22(2), 557–580 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  6. Becker, S., Bobin, J., Candès, E.J.: NESTA: a fast and accurate first-order method for sparse recovery. SIAM J. Imaging Sci. 4(1), 1–39 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  7. Becker, S., Candès, E.J., Grant, M.: Templates for convex cone problems with applications to sparse signal recovery. Math. Prog. Comput. 3(3), 165–218 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  8. Belloni, A., Chernozhukov, V., Wang, L.: Square-root LASSO: pivotal recovery of sparse signals via conic programming. Biometrika 94(4), 791–806 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  9. Ben-Tal, A., and Nemirovski, A.: Lectures on modern convex optimization: Analysis, algorithms, and engineering applications, volume 3 of MPS/SIAM Series on Optimization. SIAM (2001)

  10. Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical methods. Prentice Hall, Englewood Cliffs (1989)

    MATH  Google Scholar 

  11. Boţ, R., Hendrich, C.: A variable smoothing algorithm for solving convex optimization problems. TOP 23(1), 124–150 (2012)

    MathSciNet  MATH  Google Scholar 

  12. Bot, R.I., Hendrich, C.: A double smoothing technique for solving unconstrained nondifferentiable convex optimization problems. Comput. Optim. Appl. 54(2), 239–262 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  13. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  MATH  Google Scholar 

  14. Chen, J., Burer, S.: A first-order smoothing technique for a class of large-scale linear programs. SIAM J. Optim. 24(2), 598–620 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  15. Combettes, P., Pesquet, J.-C.: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, Chapter Proximal Splitting Methods in Signal Processing, pp. 185–212. Springer, New York (2011)

    Book  MATH  Google Scholar 

  16. Devolder, O., Glineur, F., Nesterov, Y.: Double smoothing technique for large-scale linearly constrained convex optimization. SIAM J. Optim. 22(2), 702–727 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  17. Goldfarb, D., Ma, S.: Fast alternating linearization methods of minimization of the sum of two convex functions. Math. Prog. A 141(1), 349–382 (2012)

    MATH  Google Scholar 

  18. Grant, M., Boyd, S., Ye, Y.: Disciplined convex programming. In: Liberti, L., Maculan, N. (eds.) Global Optimization: From Theory to Implementation, Nonconvex Optimization and Its Applications, pp. 155–210. Springer, New York (2006)

    Chapter  Google Scholar 

  19. Necoara, I., Suykens, J.A.K.: Applications of a smoothing technique to decomposition in convex optimization. IEEE Trans. Autom. Control 53(11), 2674–2679 (2008)

    Article  MathSciNet  Google Scholar 

  20. Nedelcu, V., Necoara, I., Tran-Dinh, Q.: Computational complexity of inexact gradient augmented lagrangian methods: application to constrained MPC. SIAM J. Optim. Control 52(5), 3109–3134 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  21. Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence \(o(1/k^2)\). Doklady AN SSSR 269 (Soviet Math. Dokl.), 543–547 (1983)

  22. Nesterov, Y.: Introductory lectures on convex optimization: a basic course. Applied Optimization, vol. 87. Kluwer Academic Publishers, Dordrecht (2004)

    MATH  Google Scholar 

  23. Nesterov, Y.: Excessive gap technique in nonsmooth convex minimization. SIAM J. Optim. 16(1), 235–249 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  24. Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Prog. 103(1), 127–152 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  25. Nesterov, Y.: Smoothing technique and its applications in semidefinite optimization. Math. Prog. 110(2), 245–259 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  26. Nesterov, Y.: Gradient methods for minimizing composite objective function. Math. Prog. 140(1), 125–161 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  27. Orabona, F., Argyriou, A., and Srebro, N.: PRISMA: proximal iterative smoothing algorithm. Tech. Report., pp. 1–21 (2012). http://arxiv.org/abs/1206.2372

  28. Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 123–231 (2013)

    Google Scholar 

  29. Rockafellar, R.T.: Monotropic programming: a generalization of linear programming and network programming. In: Rockafellar, R.T. (ed.) Convexity and Duality in Optimization, pp. 10–36. Springer, New York (1985)

    Chapter  Google Scholar 

  30. Tran-Dinh, Q.: Sequential Convex Programming and Decomposition Approaches for Nonlinear Optimization. PhD Thesis, Arenberg Doctoral School, KU Leuven, Department of Electrical Engineering (ESAT/SCD) and Optimization in Engineering Center, Kasteelpark Arenberg 10, 3001-Heverlee, Belgium (2012)

  31. Tran-Dinh, Q., Cevher, V.: A primal-dual algorithmic framework for constrained convex minimization, pp. 1–54. Tech. Report, LIONS (2014)

  32. Tran-Dinh, Q., Kyrillidis, A., Cevher, V.: Composite self-concordant minimization. J. Mach. Learn. Res. 15, 374–416 (2015)

    MathSciNet  MATH  Google Scholar 

  33. Tran-Dinh, Q., Savorgnan, C., Diehl, M.: Combining Lagrangian decomposition and excessive gap smoothing technique for solving large-scale separable convex optimization problems. Compt. Optim. Appl. 55(1), 75–111 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  34. Yang, J., Zhang, Y.: Alternating direction algorithms for \(\ell _1\) -problems in compressive sensing. SIAM J. Sci. Comput. 33(1–2), 250–278 (2011)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This research was partially supported by NSF, Grant No. 161-9884, and NAFOSTED, Grant No. 101.01-2014-24.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quoc Tran-Dinh.

Appendix: the proof of technical results

Appendix: the proof of technical results

This appendix provides the full proof of the technical results presented in the main text.

1.1 The proof of Lemma 2: descent property of the proximal gradient step

By using (9) with \(f_{\gamma }({x}) := \varphi ^{*}_{\gamma }({A}^{\top }{x})\), \(\nabla {f_{\gamma }}(\bar{{x}}) = {A}\nabla {\varphi _{\gamma }^{*}}({A}^{\top }\bar{{x}})\), \({z}:= {A}^{\top }{x}\), \(\bar{{z}} := {A}^{\top }\bar{{x}}\), and \(\Vert {A}^{\top }({x}- \bar{{x}})\Vert \le \Vert {A}\Vert \Vert {x}- \bar{{x}}\Vert \) we can show that

$$\begin{aligned} \frac{\gamma }{2}\Vert {u}^{*}_{\gamma }({A}^{\top }{x}) - {u}^{*}_{\gamma }({A}^{\top }\bar{{x}})\Vert ^2 \le f_{\gamma }({x}) - f_{\gamma }(\bar{{x}}) - \langle \nabla {f_{\gamma }}({x}), {x}- \bar{{x}}\rangle \le \frac{\Vert {A}\Vert ^2}{2\gamma }\Vert {x}- \hat{{x}}\Vert ^2. \end{aligned}$$

Using this estimate, we can show that the proof of (14) can be done similarly as in [31]. \(\square \)

1.2 The proof of Lemma 3: key estimate

We first substitute \(\beta = \frac{\gamma _{k+1}}{\Vert {A}\Vert ^2}\) into (14) and using (15) to obtain

$$\begin{aligned} F_{\gamma _{k+1}}({x}^{k+1})\le & {} F_{\gamma _{k+1}}({x}^k) - \frac{\gamma _{k+1}}{2}\Vert {u}^{*}_{\gamma _{k+1}}({A}^{\top }{x}^k) - {u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k) \Vert ^2 \nonumber \\&+ \, \frac{\Vert {A}\Vert ^2}{\gamma _{k+1}}\left\langle {x}^{k+1} - \hat{{x}}^k, {x}- \bar{{x}}^k\right\rangle -\frac{\Vert {A}\Vert ^2}{2\gamma _{k+1}}\Vert \hat{{x}}^k - {x}^{k+1}\Vert ^2. \end{aligned}$$

Multiplying this inequality by \((1-\tau _k)\) and (14) by \(\tau _k\), and summing up the results we obtain

$$\begin{aligned} F_{\gamma _{k+1}}({x}^{k+1})\le & {} (1 - \tau _k)F_{\gamma _{k+1}}({x}^k) + \tau _k\hat{\ell }^k_{\gamma _{k+1}}({x}) \nonumber \\&-\frac{(1-\tau _k)\gamma _{k+1}}{2}\Vert {u}^{*}_{\gamma _{k+1}}({A}^{\top }{x}^k) - {u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k) \Vert ^2\nonumber \\&+ \, \frac{\Vert {A}\Vert ^2}{\gamma _{k+1}}\left\langle \hat{{x}}^k - {x}^{k+1}, \hat{{x}}^k - (1-\tau _k){x}^k - \tau _k{x}\right\rangle -\frac{\Vert {A}\Vert ^2}{2\gamma _{k+1}}\Vert \hat{{x}}^k - {x}^{k+1}\Vert ^2, \end{aligned}$$

where \(\hat{\ell }_{\gamma }^k({x}) := f_{\gamma }(\hat{{x}}^k) + \langle \nabla {f_{\gamma }}(\hat{{x}}^k), {x}- \hat{{x}}^k\rangle + g({x})\).

From (16), we have \(\tau _k\tilde{{x}}^k = \hat{{x}}^k - (1-\tau _k){x}^k\), we can write this inequality as

$$\begin{aligned} F_{\gamma _{k+1}}({x}^{k+1})&\le (1 - \tau _k)F_{\gamma _{k+1}}({x}^k) + \tau _k\hat{\ell }^k_{\gamma _{k+1}}({x})\nonumber \\&\quad -\frac{(1-\tau _k)\gamma _{k+1}}{2}\Vert {u}^{*}_{\gamma _{k+1}}({A}^{\top }{x}^k) - {u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k) \Vert ^2\nonumber \\&\quad + \, \frac{\Vert {A}\Vert ^2\tau _k^2}{2\gamma _{k+1}}\Big [ \Vert \tilde{{x}}^k - {x}\Vert ^2 - \Vert \tilde{{x}}^k - \tau _k^{-1}(\hat{{x}}^k - {x}^{k+1}) - {x}\Vert ^2 \Big ]. \end{aligned}$$
(41)

Using (10) with \(\gamma := \gamma _{k+1}\), \(\hat{\gamma } := \gamma _k\) and \({z}:= {A}^{\top }{x}^k\), we get

$$\begin{aligned} f_{\gamma _{k+1}}({x}^k) \le f_{\gamma _k}({x}^k) + (\gamma _k - \gamma _{k+1})b_{\mathcal {U}}({u}^{*}_{\gamma _{k+1}}({A}^{\top }{x}^k)), \end{aligned}$$

which leads to (cf: \(F_{\gamma } = f_{\gamma } + g\)):

$$\begin{aligned} F_{\gamma _{k+1}}({x}^k) \le F_{\gamma _k}({x}^k) + (\gamma _k - \gamma _{k+1})b_{\mathcal {U}}({u}^{*}_{\gamma _{k+1}}({A}^{\top }{x}^k)). \end{aligned}$$
(42)

Next, we estimate \(\hat{\ell }_{\gamma _{k+1}}^k\). Using the definition of \(f_{\gamma }\) and \(\nabla {f}_{\gamma }\), we can deduce

$$\begin{aligned} {}\hat{\ell }_{\gamma _{k+1}}^k({x}):= & {} f_{\gamma _{k+1}}(\hat{{x}}^k) + \left\langle \nabla {f_{\gamma _{k+1}}}(\hat{{x}}^k), {x}- \hat{{x}}^k\right\rangle + g({x}) \nonumber \\= & {} \big \langle \hat{{x}}^k, {A}^{\top }{u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k)\big \rangle - \varphi ({u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k)) - \gamma _{k+1}b_{\mathcal {U}}({u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k)) \nonumber \\&+ \, \big \langle {x}- \hat{{x}}^k, {A}{u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k)\big \rangle + g({x}) \nonumber \\= & {} \big \langle {x}, {A}{u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k)\big \rangle - \varphi ({u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k)) - \gamma _{k+1}b_{\mathcal {U}}({u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k)) + g({x}) \nonumber \\\le & {} \displaystyle \max _{{u}}\left\{ \big \langle {x}, {A}{u}\big \rangle - \varphi ({u}) : {u}\in \mathcal {U}\right\} - \gamma _{k+1}b_{\mathcal {U}}({u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k)) + g({x}) \nonumber \\= & {} F({x}) - \gamma _{k+1}b_{\mathcal {U}}({u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k)). \end{aligned}$$
(43)

Substituting \(\tilde{{x}}^{k+1} := \tilde{{x}}^k - \frac{1}{\tau _k}(\hat{{x}}^k - {x}^{k+1})\) from the third line of (16) together with (42), and (43) into (41), we can derive

$$\begin{aligned} F_{\gamma _{k+1}}({x}^{k+1})\le & {} (1- \tau _k)F_{\gamma _{k}}({x}^k) + \tau _kF({x}) \\&+ \frac{\Vert {A}\Vert ^2\tau _k^2}{2\gamma _{k+1}}\left[ \Vert \tilde{{x}}^k - {x}\Vert ^2 - \Vert \tilde{{x}}^{k+1} - {x}\Vert ^2\right] - R_k, \end{aligned}$$

which is indeed (18), where \(R_k\) is given by (19).

Finally, we prove (20). Indeed, using the strong convexity and the \(L_b\)-smoothness of \(b_{\mathcal {U}}\), we can lower bound

$$\begin{aligned} R_k\ge & {} \frac{\tau _k\gamma _{k+1}}{2}\Vert {u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k) - \bar{{u}}^c\Vert ^2 + \frac{(1-\tau _k)\gamma _{k+1}}{2}\Vert {u}^{*}_{\gamma _{k+1}}({A}^{\top }{x}^k) - {u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k)\Vert ^2\\&-\frac{L_b}{2}(1-\tau _k)(\gamma _k - \gamma _{k+1})\Vert {u}^{*}_{\gamma _{k+1}}({A}^{\top }{x}^k) - \bar{{u}}^c\Vert ^2. \end{aligned}$$

Letting \(\hat{{v}}_k := {u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k) - \bar{{u}}^c\) and \( {v}_k := {u}^{*}_{\gamma _{k+1}}({A}^{\top }{x}^k) - \bar{{u}}^c\), we write \(R_k\) as

$$\begin{aligned} 2\gamma _{k+1}^{-1}R_k\ge & {} \tau _k\Vert \hat{{v}}_k \Vert ^2 + (1-\tau _k)\Vert \hat{{v}}_k - {v}_k\Vert ^2 - (1-\tau _k)(\gamma _{k+1}^{-1}\gamma _k - 1)L_b\Vert {v}_k\Vert ^2 \\= & {} \Vert \hat{{v}}_k\Vert ^2 - 2(1-\tau _k)\langle \hat{{v}}_k ,{v}_k \rangle + (1-\tau _k)\big [1 - (\gamma _{k+1}^{-1}\gamma _k - 1)L_b\big ]\Vert {v}_k\Vert ^2 \\= & {} \Vert \hat{{v}}_k - (1-\tau _k){v}_k\Vert ^2 + (1-\tau _k)\left[ \tau _k - (\gamma _{k+1}^{-1}\gamma _k - 1)L_b\right] \Vert {v}_k\Vert ^2 \\\ge & {} (1-\tau _k)\left[ \tau _k - (\gamma _{k+1}^{-1}\gamma _k - 1)L_b\right] \Vert {v}_k\Vert ^2, \end{aligned}$$

which obviously implies (20). \(\square \)

1.3 The proof of Lemma 4: the choice of parameters

First, using the update rules of \(\tau _k\) and \(\gamma _k\) in (21), we can express the quantity \(m_k\) as

$$\begin{aligned} m_k&:= \frac{\gamma _{k+1}(1-\tau _k)\left[ \gamma _{k+1}\tau _k - L_b(\gamma _k - \gamma _{k+1})\right] }{\tau _k^2} = -\frac{\gamma _1^2\bar{c}^2\left[ (L_b-1)(k + \bar{c}) + 1\right] }{(k+\bar{c})^2}. \end{aligned}$$

Moreover, it follows from the properties of \(b_{\mathcal {U}}\) that

$$\begin{aligned} \frac{1}{2}\Vert {u}^{*}_{\gamma _{k+1}}({A}^{\top }{x}^k) -\bar{{u}}^c\Vert ^2 \le b_{\mathcal {U}}({u}^{*}_{\gamma _{k+1}}({A}^{\top }{x}^k)) \le D_{\mathcal {U}}. \end{aligned}$$

Multiplying the lower bound (20) by \(\frac{\gamma _{k+1}}{\tau _k^2}\) and combining the result with the last inequality and the estimate of \(m_k\), we obtain the first lower bound in (22).

Next, using the update rules (21) of \(\tau _k\) and \(\gamma _k\), we have \(\frac{(1-\tau _k)\gamma _{k+1}}{\tau _k^2} = \frac{(k+\bar{c}-1)(k+\bar{c})^2\gamma _1\bar{c}}{(k+\bar{c})(k+\bar{c})} = \frac{\gamma _1\bar{c}(k+\bar{c}-1)^2}{(k+\bar{c}-1)} = \frac{\gamma _k}{\tau _{k-1}^2}\), which is the second equality in (22).

Using (18) with the lower bound of \(R_k\) from (22), we have

$$\begin{aligned} \frac{\gamma _{k+1}}{\tau _k^2}\Delta {F}_{k+1} + \frac{\Vert {A}\Vert ^2}{2}\Vert \tilde{{x}}^{k+1} - {x}^{\star }\Vert ^2 \le \frac{(1 - \tau _k)\gamma _{k+1}}{\tau _k^2}\Delta {F}_k + \frac{\Vert {A}\Vert ^2}{2}\Vert \tilde{{x}}^k - {x}^{\star }\Vert ^2 + s_kD_{\mathcal {U}}, \end{aligned}$$
(44)

where \(\Delta {F}_k := F_{\gamma _k}({x}^k) - F^{\star }\) and \(s_k := \frac{\gamma _1^2\bar{c}^2\left[ (L_b-1)(k + \bar{c}) + 1\right] }{(k + \bar{c})^2}\). Using this inequality and the relation \(\frac{(1-\tau _k)\gamma _{k+1}}{\tau _k^2} = \frac{\gamma _k}{\tau _{k-1}^2}\) in (22), we can easily show that

$$\begin{aligned} \frac{\gamma _{k+1}}{\tau _k^2}\Delta {F}_{k+1} + \frac{\Vert {A}\Vert ^2}{2}\Vert \tilde{{x}}^{k+1} - {x}^{\star }\Vert ^2 \le \frac{\gamma _{k}}{\tau _{k-1}^2}\Delta {F}_k + \frac{\Vert {A}\Vert ^2}{2}\Vert \tilde{{x}}^k - {x}^{\star }\Vert ^2 + s_kD_{\mathcal {U}}. \end{aligned}$$

By induction, we obtain from the last inequality that

$$\begin{aligned} \frac{\gamma _{k+1}}{\tau _k^2}\Delta {F}_{k+1} + \frac{\Vert {A}\Vert ^2}{2}\Vert \tilde{{x}}^{k+1} - {x}^{\star }\Vert ^2 \le \frac{(1-\tau _0)\gamma _1}{\tau _0^2}\Delta {F}_0 + \frac{\Vert {A}\Vert ^2}{2}\Vert \tilde{{x}}^0 - {x}^{\star }\Vert ^2 + S_kD_{\mathcal {U}}, \end{aligned}$$
(45)

which implies (23), where \(S_k := \sum _{i=0}^ks_k = \gamma _1^2\bar{c}^2\sum _{i=0}^k\frac{\left[ (L_b-1)(i + \bar{c}) + 1\right] }{(i + \bar{c})^2}\).

Finally, to prove (24), we use two elementary inequalities \(\sum _{i=1}^{k+\bar{c}}\frac{1}{i} < 1 + \ln (k+\bar{c})\) and \(\sum _{i=0}^k\frac{1}{(i+\bar{c})^2} \le \frac{1}{\bar{c}^2} + \sum _{i=1}^k\frac{1}{(i+\bar{c}-1)(i+\bar{c})} < \frac{1}{\bar{c}^2} + \frac{1}{\bar{c}}\). \(\square \)

1.4 The proof of Corollary 1: the smooth accelerated gradient method

First, it is similar to the proof of (44), we can derive

$$\begin{aligned}&\frac{\gamma _{k+1}}{(L_g\gamma _{k+1} + \Vert {A}\Vert ^2)\tau _k^2}\Delta {F}_{k+1} + \frac{1}{2}\Vert \tilde{{x}}^{k+1} - {x}^{\star }\Vert ^2 \nonumber \\&\quad \le \frac{(1 - \tau _k)\gamma _{k+1}}{(L_g\gamma _{k+1} + \Vert {A}\Vert ^2)\tau _k^2}\Delta {F}_k + \frac{1}{2}\Vert \tilde{{x}}^k - {x}^{\star }\Vert ^2 + \hat{s}_kD_{\mathcal {U}}, \end{aligned}$$

where \(\Delta {F}_k := F_{\gamma _{k}}({x}^k) - F^{\star }\), and \(\hat{s}_k := \frac{\gamma _{k+1}(1-\tau _k)\left[ L_b(\gamma _k - \gamma _{k+1}) - \gamma _{k+1}\tau _k\right] }{\tau _k^2(L_g\gamma _{k+1}+\Vert {A}\Vert ^2)}\).

Next, we impose condition \(\frac{(1-\tau _k)\gamma _{k+1}}{\tau _k^2(L_g\gamma _{k+1} + \Vert {A}\Vert ^2)} = \frac{\gamma _k}{\tau _{k-1}^2(L_g\gamma _k + \Vert {A}\Vert ^2)}\) and choose \(\tau _k = \frac{1}{k+1}\). Then, we can show from the last condition that \(\gamma _{k+1} = \frac{k\gamma _k\Vert {A}\Vert ^2}{L_g\gamma _k + \Vert {A}\Vert ^2(k+1)}\). Now, we show that \(\gamma _k \le \frac{\gamma _1}{k+1}\). Indeed, we have \(\frac{1}{\gamma _{k+1}} = \left( \frac{k+1}{k}\right) \frac{1}{\gamma _k} + \frac{L_g}{\Vert {A}\Vert ^2k} \ge \left( \frac{k+1}{k}\right) \frac{1}{\gamma _k}\), which implies that \(\gamma _{k+1} \le \frac{k}{k+1}\gamma _k\). By induction, we get \(\gamma _{k+1} \le \frac{\gamma _1}{k+1}\). On the other hand, assume that \(\frac{1}{\gamma _{k+1}} = \left( \frac{k+1}{k}\right) \frac{1}{\gamma _k} + \frac{L_g}{\Vert {A}\Vert ^2k} \le \frac{1}{\gamma _k} \left( \frac{k}{k-1}\right) \) for \(k\ge 2\). This condition leads to \(\gamma _k \le \frac{\Vert {A}\Vert ^2}{L_g(k-1)}\). Using \(\gamma _k \le \frac{\gamma _1}{k} = \frac{\Vert {A}\Vert ^2}{L_gk}\) due to the choice of \(\gamma _1\), we can show that \(\gamma _k \le \frac{\Vert {A}\Vert ^2}{L_g(k-1)}\). Hence, with the choice \(\gamma _1 := \frac{\Vert {A}\Vert ^2}{L_g}\), the estimate \(\frac{1}{\gamma _{k+1}} \le \frac{1}{\gamma _k} \left( \frac{k}{k-1}\right) \) and the update rule of \(\gamma _k\) eventually imply

$$\begin{aligned} \frac{\gamma _1\Vert {A}\Vert ^2}{(L_g\gamma _1 + 2\Vert {A}\Vert ^2)k} = \frac{\gamma _2}{k} \le \gamma _{k+1} \le \frac{\gamma _1}{k+1},~~\forall k\ge 1. \end{aligned}$$

This condition leads to \(\frac{\tau _{k-1}^2(L_g\gamma _k + \Vert {A}\Vert ^2)}{\gamma _k} = L_g\tau _{k-1}^2 + \frac{\tau _{k-1}^2}{\gamma _k}\Vert {A}\Vert ^2 \le \frac{L_g}{k^2} + \frac{3L_g(k-1)}{k^2} = \frac{3L_g}{k}\). Using the estimates of \(\tau _k\) and \(\gamma _k\), we can easily show that \(\hat{s}_k \le \frac{L_b\Vert {A}\Vert ^2}{L_g^2k(k+2)} + \frac{L_b\Vert {A}\Vert ^2}{L_g^2(k+1)(k+2)} + \frac{(L_b-1)\Vert {A}\Vert ^2k}{L_g(k+1)(k+2)}\). Hence, we can prove that

$$\begin{aligned} \hat{S}_k:= & {} \sum _{i=0}^k\hat{s}_i \le \frac{2L_b\Vert {A}\Vert ^2}{L_g^2} + \frac{(L_b-1)\Vert {A}\Vert ^2}{L_g^2}\sum _{i=0}^k\frac{1}{i+1} \nonumber \\= & {} \frac{2L_b\Vert {A}\Vert ^2}{L_g^2} + \frac{(L_b-1)\Vert {A}\Vert ^2}{L_g^2}\left( \ln (k) + 1\right) . \end{aligned}$$

Using this estimate, we can show that

$$\begin{aligned} F_{\gamma _k}({x}^k) - F^{\star }\le & {} \frac{3L_g}{2k}\Vert {x}^0 - {x}^{\star }\Vert ^2 + \frac{2L_b\Vert {A}\Vert ^2}{L_g^2k}D_{\mathcal {U}} \nonumber \\&+ \frac{(L_b-1)\Vert {A}\Vert ^2}{L_g^2k}\left( \ln (k) + 1\right) D_{\mathcal {U}}. \end{aligned}$$

Finally, using the bound (11) and \(\gamma _k \le \frac{\Vert {A}\Vert ^2}{L_g(k+1)} < \frac{\Vert {A}\Vert ^2}{L_gk}\), we obtain (32). \(\square \)

1.5 The proof of Theorem 2: primal solution recovery

Let \(\Delta {F}_k := F_{\gamma _{k}}({x}^k) - F^{\star }\). Then, by (11), we have \(\Delta {F}_k \ge F({x}^k) - F^{\star }- \gamma _kD_{\mathcal {U}} \ge -\gamma _kD_{\mathcal {U}}\). Similar to the proof of Lemma 4, we can prove that

$$\begin{aligned} \frac{\gamma _{i+1}}{\tau _i^2}\Delta {F}_{i+1}\le & {} \frac{\gamma _i}{\tau _{i-1}^2}\Delta {F}_i + \frac{\gamma _{i+1}}{\tau _i}\Delta {\hat{\ell }}_{\gamma _{i+1}}({x})\nonumber \\&+ \frac{\Vert {A}\Vert ^2}{2}\left( \Vert \tilde{{x}}^i - {x}\Vert ^2 - \Vert \tilde{{x}}^{i+1} - {x}\Vert ^2\right) + s_iD_{\mathcal {U}}, \end{aligned}$$
(46)

where \(s_i := \frac{\left[ (L_b-1)(i + \bar{c}) + 1\right] }{(i + \bar{c})^2}\) as in the proof of Lemma 4, and \(\Delta {\hat{\ell }}_{\gamma _{k+1}}^k({x}) = \big \langle {x}, {A}{u}^{*}_{\gamma _{k+1}}(\hat{{x}}^k)\big \rangle - \varphi ({u}^{*}_{\gamma _{k+1}}(\hat{{x}}^k)) + g({x}) - F^{\star }= \big \langle {x}, {A}{u}^{*}_{\gamma _{k+1}}(\hat{{x}}^k) - {b}\big \rangle - \varphi ({u}^{*}_{\gamma _{k+1}}(\hat{{x}}^k)) + s_{\mathcal {K}}({x}) - F^{\star }\). Summing up this inequality from \(i=1\) to \(i=k\) and using \(\tau _0 = 1\) and \(\tilde{{x}}^0 ={x}^0\), we obtain

$$\begin{aligned} {}\frac{\gamma _{k+1}}{\tau _k^2}\Delta {F}_{k+1}\le & {} \sum _{i=1}^k\frac{\gamma _{i+1}}{\tau _i}\Delta {\hat{\ell }}_{\gamma _{i+1}}({x}) + \frac{\Vert {A}\Vert ^2}{2}\left( \Vert \tilde{{x}}^1 - {x}\Vert ^2 - \Vert \tilde{{x}}^{k+1} - {x}\Vert ^2\right) \nonumber \\&+\,\gamma _1\Delta {F}_1 + S_kD_{\mathcal {U}},{} \end{aligned}$$
(47)

where \(S_k := \sum _{i=1}^ks_i\). Now, using again (46) with \(k=1\), \({x}^0 = \tilde{{x}}^0\) and \(\tau _0 = 1\), we get \(\gamma _1\Delta {F}_1 \le \gamma _1\tau _0\Delta {\hat{\ell }}_{\gamma _1}(x) + \frac{\Vert {A}\Vert ^2}{2}\left( \Vert {x}^0 - {x}^{\star }\Vert ^2 - \Vert \tilde{{x}}^1 - {x}^{\star }\Vert ^2\right) \). Using this into (47), one yields

$$\begin{aligned} \frac{\gamma _{k+1}}{\tau _k^2}\Delta {F}_{k+1}\le & {} \sum _{i=0}^k\frac{\gamma _{i+1}}{\tau _i}\left( \big \langle {x}, {A}{u}^{*}_{\gamma _{i+1}}(\hat{{x}}^i) - {b}\big \rangle - \varphi ({u}^{*}_{\gamma _{i+1}}(\hat{{x}}^i)) + s_{\mathcal {K}}({x})- F^{\star }\right) \nonumber \\&+ \frac{\Vert {A}\Vert ^2}{2}\Vert {x}^0 - {x}\Vert ^2 + S_kD_{\mathcal {U}}. \end{aligned}$$
(48)

Combining \(\bar{{u}}^k\) defined by (35) with \(w_i :=\frac{\gamma _{i+1}}{\tau _i}\), and the convexity of \(\varphi \), we have

$$\begin{aligned} \sum _{i=0}^k\frac{\gamma _{i+1}}{\tau _i}\left( \big \langle {x}, {A}{u}^{*}_{\gamma _{i+1}}(\hat{{x}}^i) - {b}\big \rangle - \varphi ({u}^{*}_{\gamma _{i+1}}(\hat{{x}}^i))\right) \le \varGamma _k\left( \big \langle {x}, {A}\bar{{u}}^k - {b}\big \rangle - \varphi (\bar{{u}}^k)\right) . \end{aligned}$$

Substituting this into (48) and then using \(\Delta {F}_{k+1} \ge -\gamma _{k+1}D_{\mathcal {U}}\) we get

$$\begin{aligned} -\frac{\gamma _{k+1}^2}{\tau _k^2}D_{\mathcal {U}} \le \varGamma _k\left( \big \langle {x}, {A}\bar{{u}}^k - {b}\big \rangle - \varphi (\bar{{u}}^k) + s_{\mathcal {K}}({x}) - F^{\star }\right) + \frac{\Vert {A}\Vert ^2}{2}\Vert {x}^0 - {x}\Vert ^2 \!+\! S_kD_{\mathcal {U}}, \end{aligned}$$

which implies

$$\begin{aligned} F^{\star }\le \big \langle {x}, {A}\bar{{u}}^k - {b}\big \rangle - \varphi (\bar{{u}}^k) + s_{\mathcal {K}}({x}) + \frac{\Vert {A}\Vert ^2}{2\varGamma _k}\Vert {x}^0 - {x}\Vert ^2 + \frac{D_{\mathcal {U}}}{\varGamma _k}\left( \frac{\gamma _{k+1}^2}{\tau _k^2} + S_k\right) . \end{aligned}$$

By arranging this inequality, we get

$$\begin{aligned} \inf _{{r}\in \mathcal {K}}\big \langle {x}, {b}- {A}\bar{{u}}^k - {r}\big \rangle + \varphi (\bar{{u}}^k) \le -F^{\star }+ \frac{\Vert {A}\Vert ^2}{2\varGamma _k}\Vert {x}^0 - {x}\Vert ^2 + \frac{D_{\mathcal {U}}}{\varGamma _k}\left( \frac{\gamma _{k+1}^2}{\tau _k^2} + S_k\right) , \end{aligned}$$
(49)

where we use the relation \(-s_{\mathcal {K}}({x}) = -\sup _{{r}\in \mathcal {K}}\langle {x}, {r}\rangle = \inf _{{r}\in \mathcal {K}}\langle {x}, -{r}\rangle \). On the other hand, by the saddle point theory for the primal and dual problems (33) and (34), for any optimal solution \({x}^{\star }\), we can show that

$$\begin{aligned} -F^{\star } = \varphi ^{\star } \le \varphi ({u}) - \langle {x}^{\star }, {A}{u}- {b}+ {r}\rangle , ~\forall {u}\in \mathcal {U},~{r}\in \mathcal {K}. \end{aligned}$$

Since this inequality holds for any \({r}\in \mathcal {K}\) and \({u}\in \mathcal {U}\), by using \({u}= \bar{{u}}^k\), it leads to

$$\begin{aligned} \inf _{{r}\in \mathcal {K}}\big \langle {x}^{\star }, {A}\bar{{u}}^k - {b}+ {r}\big \rangle - \varphi (\bar{{u}}^k) \le F^{\star }. \end{aligned}$$
(50)

Combining (49) and (50) yields

$$\begin{aligned}&\min _{{r}\in \mathcal {K}}\Big \{\big \langle {x}^{\star }- {x}, {r}+ {A}\bar{{u}}^k - {b}\big \rangle - \frac{\Vert {A}\Vert ^2}{2\varGamma _k}\Vert {x}^0 - {x}\Vert ^2\Big \} \le \frac{D_{\mathcal {U}}}{\varGamma _k}\left( \frac{\gamma _{k+1}^2}{\tau _k^2} + S_k\right) , ~~\forall {x}\in \mathbb {R}^p.\nonumber \\ \end{aligned}$$
(51)

Taking \({x}:= {x}^0 - \Vert {A}\Vert ^{-2}\varGamma _k({A}\bar{{u}}^k - {b}+ {r})\) for any \({r}\in \mathcal {K}\), we obtain from (51) that

$$\begin{aligned} \min _{{r}\in \mathcal {K}}\left\{ \frac{\varGamma _k}{\Vert {A}\Vert ^2}\Vert {A}\bar{{u}}^k + {r}- {b}\Vert ^2 + 2\big \langle {A}\bar{{u}}^k - {b}+ {r}, {x}^{\star }- {x}^0\big \rangle \right\} \le \frac{2D_{\mathcal {U}}}{\varGamma _k}\left( \frac{ \gamma _{k+1}^2}{\tau _k^2} + S_k\right) , \end{aligned}$$

which implies (by the Cauchy–Schwarz inequality)

$$\begin{aligned}&\min _{{r}\in \mathcal {K}}\left\{ \varGamma _k\Vert {A}\bar{{u}}^k - {b}+ {r}\Vert ^2 - 2\Vert {A}\Vert ^2\Vert {A}\bar{{u}}^k - {b}+ {r}\Vert \Vert {x}^{\star }- {x}^0\Vert \right\} \\&\quad \le \frac{2\Vert {A}\Vert ^2D_{\mathcal {U}}}{\varGamma _k}\left( \frac{\gamma _{k+1}^2}{\tau _k^2} + S_k\right) . \end{aligned}$$

By elementary calculations and \(\mathrm {dist}\left( {b}- {A}\bar{{u}}^k, \mathcal {K}\right) = \inf \left\{ \Vert {A}\bar{{u}}^k-{b}+ {r}\Vert : {r}\in \mathcal {K}\right\} \), we can show from the last inequality that

$$\begin{aligned} \mathrm {dist}\left( {b}- {A}\bar{{u}}^k, \mathcal {K}\right) \le \frac{\Vert {A}\Vert ^2}{\varGamma _k}\Big [\Vert {x}^0 - {x}^{\star }\Vert + \sqrt{\Vert {x}^0 - {x}^{\star }\Vert ^2 + \frac{2}{\Vert {A}\Vert ^2}\Big ( S_k+\frac{\gamma _{k+1}^2}{\tau _k^2}\Big )D_{\mathcal {U}}}\Big ]. \end{aligned}$$
(52)

To prove the first estimate of (37), we use (49) with \({x}= \varvec{0}^p\) and \(F^{\star } = -\varphi ^{\star }\) to get

$$\begin{aligned} \varphi (\bar{{u}}^k) - \varphi ^{\star } \le \frac{1}{\varGamma _k}\Big [\frac{\Vert {A}\Vert ^2}{2}\Vert {x}^0\Vert ^2 + \Big (\frac{\gamma _{k+1}^2}{\tau _k^2} + S_k\Big )D_{\mathcal {U}}\Big ]. \end{aligned}$$
(53)

Since we apply Algorithm 1(c) to solve the dual problem (34) using \(b_{\mathcal {U}}\) such that \(L_b = 1\), we have \(S^k \le 2\gamma _1^2\). Then, by using \(\gamma _{k+1} = \frac{\bar{c}\gamma _1}{k+\bar{c}}\), and \(\tau _k := \frac{1}{k+\bar{c}}\), we can show that \(\frac{\gamma _{k+1}^2}{\tau _k^2} = \gamma _1\bar{c}\). Moreover, we also have \(\varGamma _k := \sum _{i=0}^k\frac{\gamma _{i+1}}{\tau _i} = \gamma _1\bar{c}(k+1)\). Using these estimates, and \(S_k \le 2\gamma _1^2\) from (4) into (52) and (53) we obtain (37). For the left-hand side inequality in the first estimate of (37), we use a simple bound \(-\Vert {x}^{\star }\Vert \mathrm {dist}\left( {b}- {A}{u},\mathcal {K}\right) \le \varphi ({u}) - \varphi ^{\star }\) for \({u}= \bar{{u}}^k \in \mathcal {U}\) from the saddle point theory as in (50). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tran-Dinh, Q. Adaptive smoothing algorithms for nonsmooth composite convex minimization. Comput Optim Appl 66, 425–451 (2017). https://doi.org/10.1007/s10589-016-9873-6

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-016-9873-6

Keywords

Mathematics Subject Classification

Navigation