Adaptive smoothing algorithms for nonsmooth composite convex minimization

Tran-Dinh, Quoc

doi:10.1007/s10589-016-9873-6

Adaptive smoothing algorithms for nonsmooth composite convex minimization

Published: 15 September 2016

Volume 66, pages 425–451, (2017)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

Quoc Tran-Dinh¹

939 Accesses
14 Citations
Explore all metrics

Abstract

We propose an adaptive smoothing algorithm based on Nesterov’s smoothing technique in Nesterov (Math Prog 103(1):127–152, 2005) for solving “fully” nonsmooth composite convex optimization problems. Our method combines both Nesterov’s accelerated proximal gradient scheme and a new homotopy strategy for smoothness parameter. By an appropriate choice of smoothing functions, we develop a new algorithm that has the $\mathcal {O}\left( \frac{1}{\varepsilon }\right) $-worst-case iteration-complexity while preserves the same complexity-per-iteration as in Nesterov’s method and allows one to automatically update the smoothness parameter at each iteration. Then, we customize our algorithm to solve four special cases that cover various applications. We also specify our algorithm to solve constrained convex optimization problems and show its convergence guarantee on a primal sequence of iterates. We demonstrate our algorithm through three numerical examples and compare it with other related algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A class of exact penalty functions and penalty algorithms for nonsmooth constrained optimization problems

Article 19 October 2019

Solving Minimax Problems: Local Smoothing Versus Global Smoothing

Subgradient Smoothing Method for Nonsmooth Nonconvex Optimization

References

Andreas, A., Marco, S., Suykens, J.: Hybrid conditional gradient-smoothing algorithms with applications to sparse and low rank regularization. In: Andreas, A., Marco, S., Suykens, J. (eds.) Regularization, Optimization, Kernels, and Support Vector Machines, pp. 53–82. CRC Press, Boca Raton (2014)
Google Scholar
Baes, M., and Bürgisser, M.: Smoothing techniques for solving semi-definite programs with many constraints. Optimization Online (2009)
Bauschke, H.H., Combettes, P.: Convex Analysis and Monotone Operators Theory in Hilbert Spaces. Springer, New York (2011)
Book MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding agorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: Smoothing and first order methods: a unified framework. SIAM J. Optim. 22(2), 557–580 (2012)
Article MathSciNet MATH Google Scholar
Becker, S., Bobin, J., Candès, E.J.: NESTA: a fast and accurate first-order method for sparse recovery. SIAM J. Imaging Sci. 4(1), 1–39 (2011)
Article MathSciNet MATH Google Scholar
Becker, S., Candès, E.J., Grant, M.: Templates for convex cone problems with applications to sparse signal recovery. Math. Prog. Comput. 3(3), 165–218 (2011)
Article MathSciNet MATH Google Scholar
Belloni, A., Chernozhukov, V., Wang, L.: Square-root LASSO: pivotal recovery of sparse signals via conic programming. Biometrika 94(4), 791–806 (2011)
Article MathSciNet MATH Google Scholar
Ben-Tal, A., and Nemirovski, A.: Lectures on modern convex optimization: Analysis, algorithms, and engineering applications, volume 3 of MPS/SIAM Series on Optimization. SIAM (2001)
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical methods. Prentice Hall, Englewood Cliffs (1989)
MATH Google Scholar
Boţ, R., Hendrich, C.: A variable smoothing algorithm for solving convex optimization problems. TOP 23(1), 124–150 (2012)
MathSciNet MATH Google Scholar
Bot, R.I., Hendrich, C.: A double smoothing technique for solving unconstrained nondifferentiable convex optimization problems. Comput. Optim. Appl. 54(2), 239–262 (2013)
Article MathSciNet MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article MATH Google Scholar
Chen, J., Burer, S.: A first-order smoothing technique for a class of large-scale linear programs. SIAM J. Optim. 24(2), 598–620 (2014)
Article MathSciNet MATH Google Scholar
Combettes, P., Pesquet, J.-C.: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, Chapter Proximal Splitting Methods in Signal Processing, pp. 185–212. Springer, New York (2011)
Book MATH Google Scholar
Devolder, O., Glineur, F., Nesterov, Y.: Double smoothing technique for large-scale linearly constrained convex optimization. SIAM J. Optim. 22(2), 702–727 (2012)
Article MathSciNet MATH Google Scholar
Goldfarb, D., Ma, S.: Fast alternating linearization methods of minimization of the sum of two convex functions. Math. Prog. A 141(1), 349–382 (2012)
MATH Google Scholar
Grant, M., Boyd, S., Ye, Y.: Disciplined convex programming. In: Liberti, L., Maculan, N. (eds.) Global Optimization: From Theory to Implementation, Nonconvex Optimization and Its Applications, pp. 155–210. Springer, New York (2006)
Chapter Google Scholar
Necoara, I., Suykens, J.A.K.: Applications of a smoothing technique to decomposition in convex optimization. IEEE Trans. Autom. Control 53(11), 2674–2679 (2008)
Article MathSciNet Google Scholar
Nedelcu, V., Necoara, I., Tran-Dinh, Q.: Computational complexity of inexact gradient augmented lagrangian methods: application to constrained MPC. SIAM J. Optim. Control 52(5), 3109–3134 (2014)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence $o(1/k^2)$. Doklady AN SSSR 269 (Soviet Math. Dokl.), 543–547 (1983)
Nesterov, Y.: Introductory lectures on convex optimization: a basic course. Applied Optimization, vol. 87. Kluwer Academic Publishers, Dordrecht (2004)
MATH Google Scholar
Nesterov, Y.: Excessive gap technique in nonsmooth convex minimization. SIAM J. Optim. 16(1), 235–249 (2005)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Prog. 103(1), 127–152 (2005)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Smoothing technique and its applications in semidefinite optimization. Math. Prog. 110(2), 245–259 (2007)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite objective function. Math. Prog. 140(1), 125–161 (2013)
Article MathSciNet MATH Google Scholar
Orabona, F., Argyriou, A., and Srebro, N.: PRISMA: proximal iterative smoothing algorithm. Tech. Report., pp. 1–21 (2012). http://arxiv.org/abs/1206.2372
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 123–231 (2013)
Google Scholar
Rockafellar, R.T.: Monotropic programming: a generalization of linear programming and network programming. In: Rockafellar, R.T. (ed.) Convexity and Duality in Optimization, pp. 10–36. Springer, New York (1985)
Chapter Google Scholar
Tran-Dinh, Q.: Sequential Convex Programming and Decomposition Approaches for Nonlinear Optimization. PhD Thesis, Arenberg Doctoral School, KU Leuven, Department of Electrical Engineering (ESAT/SCD) and Optimization in Engineering Center, Kasteelpark Arenberg 10, 3001-Heverlee, Belgium (2012)
Tran-Dinh, Q., Cevher, V.: A primal-dual algorithmic framework for constrained convex minimization, pp. 1–54. Tech. Report, LIONS (2014)
Tran-Dinh, Q., Kyrillidis, A., Cevher, V.: Composite self-concordant minimization. J. Mach. Learn. Res. 15, 374–416 (2015)
MathSciNet MATH Google Scholar
Tran-Dinh, Q., Savorgnan, C., Diehl, M.: Combining Lagrangian decomposition and excessive gap smoothing technique for solving large-scale separable convex optimization problems. Compt. Optim. Appl. 55(1), 75–111 (2013)
Article MathSciNet MATH Google Scholar
Yang, J., Zhang, Y.: Alternating direction algorithms for $\ell _1$ -problems in compressive sensing. SIAM J. Sci. Comput. 33(1–2), 250–278 (2011)
Article MathSciNet Google Scholar

Download references

Acknowledgments

This research was partially supported by NSF, Grant No. 161-9884, and NAFOSTED, Grant No. 101.01-2014-24.

Author information

Authors and Affiliations

Department of Statistics and Operations Research, University of North Carolina at Chapel Hill (UNC), Chapel Hill, USA
Quoc Tran-Dinh

Authors

Quoc Tran-Dinh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quoc Tran-Dinh.

Appendix: the proof of technical results

This appendix provides the full proof of the technical results presented in the main text.

1.1 The proof of Lemma 2: descent property of the proximal gradient step

By using (9) with $f_{\gamma }({x}) := \varphi ^{*}_{\gamma }({A}^{\top }{x})$, $\nabla {f_{\gamma }}(\bar{{x}}) = {A}\nabla {\varphi _{\gamma }^{*}}({A}^{\top }\bar{{x}})$, ${z}:= {A}^{\top }{x}$, $\bar{{z}} := {A}^{\top }\bar{{x}}$, and $\Vert {A}^{\top }({x}- \bar{{x}})\Vert \le \Vert {A}\Vert \Vert {x}- \bar{{x}}\Vert $ we can show that

$$\begin{aligned} \frac{\gamma }{2}\Vert {u}^{*}_{\gamma }({A}^{\top }{x}) - {u}^{*}_{\gamma }({A}^{\top }\bar{{x}})\Vert ^2 \le f_{\gamma }({x}) - f_{\gamma }(\bar{{x}}) - \langle \nabla {f_{\gamma }}({x}), {x}- \bar{{x}}\rangle \le \frac{\Vert {A}\Vert ^2}{2\gamma }\Vert {x}- \hat{{x}}\Vert ^2. \end{aligned}$$

Using this estimate, we can show that the proof of (14) can be done similarly as in [31]. $\square $

1.2 The proof of Lemma 3: key estimate

We first substitute $\beta = \frac{\gamma _{k+1}}{\Vert {A}\Vert ^2}$ into (14) and using (15) to obtain

$$\begin{aligned} F_{\gamma _{k+1}}({x}^{k+1})\le & {} F_{\gamma _{k+1}}({x}^k) - \frac{\gamma _{k+1}}{2}\Vert {u}^{*}_{\gamma _{k+1}}({A}^{\top }{x}^k) - {u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k) \Vert ^2 \nonumber \\&+ \, \frac{\Vert {A}\Vert ^2}{\gamma _{k+1}}\left\langle {x}^{k+1} - \hat{{x}}^k, {x}- \bar{{x}}^k\right\rangle -\frac{\Vert {A}\Vert ^2}{2\gamma _{k+1}}\Vert \hat{{x}}^k - {x}^{k+1}\Vert ^2. \end{aligned}$$

Multiplying this inequality by $(1-\tau _k)$ and (14) by $\tau _k$, and summing up the results we obtain

$$\begin{aligned} F_{\gamma _{k+1}}({x}^{k+1})\le & {} (1 - \tau _k)F_{\gamma _{k+1}}({x}^k) + \tau _k\hat{\ell }^k_{\gamma _{k+1}}({x}) \nonumber \\&-\frac{(1-\tau _k)\gamma _{k+1}}{2}\Vert {u}^{*}_{\gamma _{k+1}}({A}^{\top }{x}^k) - {u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k) \Vert ^2\nonumber \\&+ \, \frac{\Vert {A}\Vert ^2}{\gamma _{k+1}}\left\langle \hat{{x}}^k - {x}^{k+1}, \hat{{x}}^k - (1-\tau _k){x}^k - \tau _k{x}\right\rangle -\frac{\Vert {A}\Vert ^2}{2\gamma _{k+1}}\Vert \hat{{x}}^k - {x}^{k+1}\Vert ^2, \end{aligned}$$

where $\hat{\ell }_{\gamma }^k({x}) := f_{\gamma }(\hat{{x}}^k) + \langle \nabla {f_{\gamma }}(\hat{{x}}^k), {x}- \hat{{x}}^k\rangle + g({x})$.

From (16), we have $\tau _k\tilde{{x}}^k = \hat{{x}}^k - (1-\tau _k){x}^k$, we can write this inequality as

$$\begin{aligned} F_{\gamma _{k+1}}({x}^{k+1})&\le (1 - \tau _k)F_{\gamma _{k+1}}({x}^k) + \tau _k\hat{\ell }^k_{\gamma _{k+1}}({x})\nonumber \\&\quad -\frac{(1-\tau _k)\gamma _{k+1}}{2}\Vert {u}^{*}_{\gamma _{k+1}}({A}^{\top }{x}^k) - {u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k) \Vert ^2\nonumber \\&\quad + \, \frac{\Vert {A}\Vert ^2\tau _k^2}{2\gamma _{k+1}}\Big [ \Vert \tilde{{x}}^k - {x}\Vert ^2 - \Vert \tilde{{x}}^k - \tau _k^{-1}(\hat{{x}}^k - {x}^{k+1}) - {x}\Vert ^2 \Big ]. \end{aligned}$$

(41)

Using (10) with $\gamma := \gamma _{k+1}$, $\hat{\gamma } := \gamma _k$ and ${z}:= {A}^{\top }{x}^k$, we get

$$\begin{aligned} f_{\gamma _{k+1}}({x}^k) \le f_{\gamma _k}({x}^k) + (\gamma _k - \gamma _{k+1})b_{\mathcal {U}}({u}^{*}_{\gamma _{k+1}}({A}^{\top }{x}^k)), \end{aligned}$$

which leads to (cf: $F_{\gamma } = f_{\gamma } + g$):

$$\begin{aligned} F_{\gamma _{k+1}}({x}^k) \le F_{\gamma _k}({x}^k) + (\gamma _k - \gamma _{k+1})b_{\mathcal {U}}({u}^{*}_{\gamma _{k+1}}({A}^{\top }{x}^k)). \end{aligned}$$

(42)

Next, we estimate $\hat{\ell }_{\gamma _{k+1}}^k$. Using the definition of $f_{\gamma }$ and $\nabla {f}_{\gamma }$, we can deduce

$$\begin{aligned} {}\hat{\ell }_{\gamma _{k+1}}^k({x}):= & {} f_{\gamma _{k+1}}(\hat{{x}}^k) + \left\langle \nabla {f_{\gamma _{k+1}}}(\hat{{x}}^k), {x}- \hat{{x}}^k\right\rangle + g({x}) \nonumber \\= & {} \big \langle \hat{{x}}^k, {A}^{\top }{u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k)\big \rangle - \varphi ({u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k)) - \gamma _{k+1}b_{\mathcal {U}}({u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k)) \nonumber \\&+ \, \big \langle {x}- \hat{{x}}^k, {A}{u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k)\big \rangle + g({x}) \nonumber \\= & {} \big \langle {x}, {A}{u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k)\big \rangle - \varphi ({u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k)) - \gamma _{k+1}b_{\mathcal {U}}({u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k)) + g({x}) \nonumber \\\le & {} \displaystyle \max _{{u}}\left\{ \big \langle {x}, {A}{u}\big \rangle - \varphi ({u}) : {u}\in \mathcal {U}\right\} - \gamma _{k+1}b_{\mathcal {U}}({u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k)) + g({x}) \nonumber \\= & {} F({x}) - \gamma _{k+1}b_{\mathcal {U}}({u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k)). \end{aligned}$$

(43)

Substituting $\tilde{{x}}^{k+1} := \tilde{{x}}^k - \frac{1}{\tau _k}(\hat{{x}}^k - {x}^{k+1})$ from the third line of (16) together with (42), and (43) into (41), we can derive

$$\begin{aligned} F_{\gamma _{k+1}}({x}^{k+1})\le & {} (1- \tau _k)F_{\gamma _{k}}({x}^k) + \tau _kF({x}) \\&+ \frac{\Vert {A}\Vert ^2\tau _k^2}{2\gamma _{k+1}}\left[ \Vert \tilde{{x}}^k - {x}\Vert ^2 - \Vert \tilde{{x}}^{k+1} - {x}\Vert ^2\right] - R_k, \end{aligned}$$

which is indeed (18), where $R_k$ is given by (19).

Finally, we prove (20). Indeed, using the strong convexity and the $L_b$-smoothness of $b_{\mathcal {U}}$, we can lower bound

$$\begin{aligned} R_k\ge & {} \frac{\tau _k\gamma _{k+1}}{2}\Vert {u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k) - \bar{{u}}^c\Vert ^2 + \frac{(1-\tau _k)\gamma _{k+1}}{2}\Vert {u}^{*}_{\gamma _{k+1}}({A}^{\top }{x}^k) - {u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k)\Vert ^2\\&-\frac{L_b}{2}(1-\tau _k)(\gamma _k - \gamma _{k+1})\Vert {u}^{*}_{\gamma _{k+1}}({A}^{\top }{x}^k) - \bar{{u}}^c\Vert ^2. \end{aligned}$$

Letting $\hat{{v}}_k := {u}^{*}_{\gamma _{k+1}}({A}^{\top }\hat{{x}}^k) - \bar{{u}}^c$ and $ {v}_k := {u}^{*}_{\gamma _{k+1}}({A}^{\top }{x}^k) - \bar{{u}}^c$, we write $R_k$ as

$$\begin{aligned} 2\gamma _{k+1}^{-1}R_k\ge & {} \tau _k\Vert \hat{{v}}_k \Vert ^2 + (1-\tau _k)\Vert \hat{{v}}_k - {v}_k\Vert ^2 - (1-\tau _k)(\gamma _{k+1}^{-1}\gamma _k - 1)L_b\Vert {v}_k\Vert ^2 \\= & {} \Vert \hat{{v}}_k\Vert ^2 - 2(1-\tau _k)\langle \hat{{v}}_k ,{v}_k \rangle + (1-\tau _k)\big [1 - (\gamma _{k+1}^{-1}\gamma _k - 1)L_b\big ]\Vert {v}_k\Vert ^2 \\= & {} \Vert \hat{{v}}_k - (1-\tau _k){v}_k\Vert ^2 + (1-\tau _k)\left[ \tau _k - (\gamma _{k+1}^{-1}\gamma _k - 1)L_b\right] \Vert {v}_k\Vert ^2 \\\ge & {} (1-\tau _k)\left[ \tau _k - (\gamma _{k+1}^{-1}\gamma _k - 1)L_b\right] \Vert {v}_k\Vert ^2, \end{aligned}$$

which obviously implies (20). $\square $

1.3 The proof of Lemma 4: the choice of parameters

First, using the update rules of $\tau _k$ and $\gamma _k$ in (21), we can express the quantity $m_k$ as

$$\begin{aligned} m_k&:= \frac{\gamma _{k+1}(1-\tau _k)\left[ \gamma _{k+1}\tau _k - L_b(\gamma _k - \gamma _{k+1})\right] }{\tau _k^2} = -\frac{\gamma _1^2\bar{c}^2\left[ (L_b-1)(k + \bar{c}) + 1\right] }{(k+\bar{c})^2}. \end{aligned}$$

Moreover, it follows from the properties of $b_{\mathcal {U}}$ that

$$\begin{aligned} \frac{1}{2}\Vert {u}^{*}_{\gamma _{k+1}}({A}^{\top }{x}^k) -\bar{{u}}^c\Vert ^2 \le b_{\mathcal {U}}({u}^{*}_{\gamma _{k+1}}({A}^{\top }{x}^k)) \le D_{\mathcal {U}}. \end{aligned}$$

Multiplying the lower bound (20) by $\frac{\gamma _{k+1}}{\tau _k^2}$ and combining the result with the last inequality and the estimate of $m_k$, we obtain the first lower bound in (22).

Next, using the update rules (21) of $\tau _k$ and $\gamma _k$, we have $\frac{(1-\tau _k)\gamma _{k+1}}{\tau _k^2} = \frac{(k+\bar{c}-1)(k+\bar{c})^2\gamma _1\bar{c}}{(k+\bar{c})(k+\bar{c})} = \frac{\gamma _1\bar{c}(k+\bar{c}-1)^2}{(k+\bar{c}-1)} = \frac{\gamma _k}{\tau _{k-1}^2}$, which is the second equality in (22).

Using (18) with the lower bound of $R_k$ from (22), we have

$$\begin{aligned} \frac{\gamma _{k+1}}{\tau _k^2}\Delta {F}_{k+1} + \frac{\Vert {A}\Vert ^2}{2}\Vert \tilde{{x}}^{k+1} - {x}^{\star }\Vert ^2 \le \frac{(1 - \tau _k)\gamma _{k+1}}{\tau _k^2}\Delta {F}_k + \frac{\Vert {A}\Vert ^2}{2}\Vert \tilde{{x}}^k - {x}^{\star }\Vert ^2 + s_kD_{\mathcal {U}}, \end{aligned}$$

(44)

where $\Delta {F}_k := F_{\gamma _k}({x}^k) - F^{\star }$ and $s_k := \frac{\gamma _1^2\bar{c}^2\left[ (L_b-1)(k + \bar{c}) + 1\right] }{(k + \bar{c})^2}$. Using this inequality and the relation $\frac{(1-\tau _k)\gamma _{k+1}}{\tau _k^2} = \frac{\gamma _k}{\tau _{k-1}^2}$ in (22), we can easily show that

$$\begin{aligned} \frac{\gamma _{k+1}}{\tau _k^2}\Delta {F}_{k+1} + \frac{\Vert {A}\Vert ^2}{2}\Vert \tilde{{x}}^{k+1} - {x}^{\star }\Vert ^2 \le \frac{\gamma _{k}}{\tau _{k-1}^2}\Delta {F}_k + \frac{\Vert {A}\Vert ^2}{2}\Vert \tilde{{x}}^k - {x}^{\star }\Vert ^2 + s_kD_{\mathcal {U}}. \end{aligned}$$

By induction, we obtain from the last inequality that

$$\begin{aligned} \frac{\gamma _{k+1}}{\tau _k^2}\Delta {F}_{k+1} + \frac{\Vert {A}\Vert ^2}{2}\Vert \tilde{{x}}^{k+1} - {x}^{\star }\Vert ^2 \le \frac{(1-\tau _0)\gamma _1}{\tau _0^2}\Delta {F}_0 + \frac{\Vert {A}\Vert ^2}{2}\Vert \tilde{{x}}^0 - {x}^{\star }\Vert ^2 + S_kD_{\mathcal {U}}, \end{aligned}$$

(45)

which implies (23), where $S_k := \sum _{i=0}^ks_k = \gamma _1^2\bar{c}^2\sum _{i=0}^k\frac{\left[ (L_b-1)(i + \bar{c}) + 1\right] }{(i + \bar{c})^2}$.

Finally, to prove (24), we use two elementary inequalities $\sum _{i=1}^{k+\bar{c}}\frac{1}{i} < 1 + \ln (k+\bar{c})$ and $\sum _{i=0}^k\frac{1}{(i+\bar{c})^2} \le \frac{1}{\bar{c}^2} + \sum _{i=1}^k\frac{1}{(i+\bar{c}-1)(i+\bar{c})} < \frac{1}{\bar{c}^2} + \frac{1}{\bar{c}}$. $\square $

1.4 The proof of Corollary 1: the smooth accelerated gradient method

First, it is similar to the proof of (44), we can derive

$$\begin{aligned}&\frac{\gamma _{k+1}}{(L_g\gamma _{k+1} + \Vert {A}\Vert ^2)\tau _k^2}\Delta {F}_{k+1} + \frac{1}{2}\Vert \tilde{{x}}^{k+1} - {x}^{\star }\Vert ^2 \nonumber \\&\quad \le \frac{(1 - \tau _k)\gamma _{k+1}}{(L_g\gamma _{k+1} + \Vert {A}\Vert ^2)\tau _k^2}\Delta {F}_k + \frac{1}{2}\Vert \tilde{{x}}^k - {x}^{\star }\Vert ^2 + \hat{s}_kD_{\mathcal {U}}, \end{aligned}$$

where $\Delta {F}_k := F_{\gamma _{k}}({x}^k) - F^{\star }$, and $\hat{s}_k := \frac{\gamma _{k+1}(1-\tau _k)\left[ L_b(\gamma _k - \gamma _{k+1}) - \gamma _{k+1}\tau _k\right] }{\tau _k^2(L_g\gamma _{k+1}+\Vert {A}\Vert ^2)}$.

Next, we impose condition $\frac{(1-\tau _k)\gamma _{k+1}}{\tau _k^2(L_g\gamma _{k+1} + \Vert {A}\Vert ^2)} = \frac{\gamma _k}{\tau _{k-1}^2(L_g\gamma _k + \Vert {A}\Vert ^2)}$ and choose $\tau _k = \frac{1}{k+1}$. Then, we can show from the last condition that $\gamma _{k+1} = \frac{k\gamma _k\Vert {A}\Vert ^2}{L_g\gamma _k + \Vert {A}\Vert ^2(k+1)}$. Now, we show that $\gamma _k \le \frac{\gamma _1}{k+1}$. Indeed, we have $\frac{1}{\gamma _{k+1}} = \left( \frac{k+1}{k}\right) \frac{1}{\gamma _k} + \frac{L_g}{\Vert {A}\Vert ^2k} \ge \left( \frac{k+1}{k}\right) \frac{1}{\gamma _k}$, which implies that $\gamma _{k+1} \le \frac{k}{k+1}\gamma _k$. By induction, we get $\gamma _{k+1} \le \frac{\gamma _1}{k+1}$. On the other hand, assume that $\frac{1}{\gamma _{k+1}} = \left( \frac{k+1}{k}\right) \frac{1}{\gamma _k} + \frac{L_g}{\Vert {A}\Vert ^2k} \le \frac{1}{\gamma _k} \left( \frac{k}{k-1}\right) $ for $k\ge 2$. This condition leads to $\gamma _k \le \frac{\Vert {A}\Vert ^2}{L_g(k-1)}$. Using $\gamma _k \le \frac{\gamma _1}{k} = \frac{\Vert {A}\Vert ^2}{L_gk}$ due to the choice of $\gamma _1$, we can show that $\gamma _k \le \frac{\Vert {A}\Vert ^2}{L_g(k-1)}$. Hence, with the choice $\gamma _1 := \frac{\Vert {A}\Vert ^2}{L_g}$, the estimate $\frac{1}{\gamma _{k+1}} \le \frac{1}{\gamma _k} \left( \frac{k}{k-1}\right) $ and the update rule of $\gamma _k$ eventually imply

$$\begin{aligned} \frac{\gamma _1\Vert {A}\Vert ^2}{(L_g\gamma _1 + 2\Vert {A}\Vert ^2)k} = \frac{\gamma _2}{k} \le \gamma _{k+1} \le \frac{\gamma _1}{k+1},~~\forall k\ge 1. \end{aligned}$$

This condition leads to $\frac{\tau _{k-1}^2(L_g\gamma _k + \Vert {A}\Vert ^2)}{\gamma _k} = L_g\tau _{k-1}^2 + \frac{\tau _{k-1}^2}{\gamma _k}\Vert {A}\Vert ^2 \le \frac{L_g}{k^2} + \frac{3L_g(k-1)}{k^2} = \frac{3L_g}{k}$. Using the estimates of $\tau _k$ and $\gamma _k$, we can easily show that $\hat{s}_k \le \frac{L_b\Vert {A}\Vert ^2}{L_g^2k(k+2)} + \frac{L_b\Vert {A}\Vert ^2}{L_g^2(k+1)(k+2)} + \frac{(L_b-1)\Vert {A}\Vert ^2k}{L_g(k+1)(k+2)}$. Hence, we can prove that

$$\begin{aligned} \hat{S}_k:= & {} \sum _{i=0}^k\hat{s}_i \le \frac{2L_b\Vert {A}\Vert ^2}{L_g^2} + \frac{(L_b-1)\Vert {A}\Vert ^2}{L_g^2}\sum _{i=0}^k\frac{1}{i+1} \nonumber \\= & {} \frac{2L_b\Vert {A}\Vert ^2}{L_g^2} + \frac{(L_b-1)\Vert {A}\Vert ^2}{L_g^2}\left( \ln (k) + 1\right) . \end{aligned}$$

Using this estimate, we can show that

$$\begin{aligned} F_{\gamma _k}({x}^k) - F^{\star }\le & {} \frac{3L_g}{2k}\Vert {x}^0 - {x}^{\star }\Vert ^2 + \frac{2L_b\Vert {A}\Vert ^2}{L_g^2k}D_{\mathcal {U}} \nonumber \\&+ \frac{(L_b-1)\Vert {A}\Vert ^2}{L_g^2k}\left( \ln (k) + 1\right) D_{\mathcal {U}}. \end{aligned}$$

Finally, using the bound (11) and $\gamma _k \le \frac{\Vert {A}\Vert ^2}{L_g(k+1)} < \frac{\Vert {A}\Vert ^2}{L_gk}$, we obtain (32). $\square $

1.5 The proof of Theorem 2: primal solution recovery

Let $\Delta {F}_k := F_{\gamma _{k}}({x}^k) - F^{\star }$. Then, by (11), we have $\Delta {F}_k \ge F({x}^k) - F^{\star }- \gamma _kD_{\mathcal {U}} \ge -\gamma _kD_{\mathcal {U}}$. Similar to the proof of Lemma 4, we can prove that

$$\begin{aligned} \frac{\gamma _{i+1}}{\tau _i^2}\Delta {F}_{i+1}\le & {} \frac{\gamma _i}{\tau _{i-1}^2}\Delta {F}_i + \frac{\gamma _{i+1}}{\tau _i}\Delta {\hat{\ell }}_{\gamma _{i+1}}({x})\nonumber \\&+ \frac{\Vert {A}\Vert ^2}{2}\left( \Vert \tilde{{x}}^i - {x}\Vert ^2 - \Vert \tilde{{x}}^{i+1} - {x}\Vert ^2\right) + s_iD_{\mathcal {U}}, \end{aligned}$$

(46)

where $s_i := \frac{\left[ (L_b-1)(i + \bar{c}) + 1\right] }{(i + \bar{c})^2}$ as in the proof of Lemma 4, and $\Delta {\hat{\ell }}_{\gamma _{k+1}}^k({x}) = \big \langle {x}, {A}{u}^{*}_{\gamma _{k+1}}(\hat{{x}}^k)\big \rangle - \varphi ({u}^{*}_{\gamma _{k+1}}(\hat{{x}}^k)) + g({x}) - F^{\star }= \big \langle {x}, {A}{u}^{*}_{\gamma _{k+1}}(\hat{{x}}^k) - {b}\big \rangle - \varphi ({u}^{*}_{\gamma _{k+1}}(\hat{{x}}^k)) + s_{\mathcal {K}}({x}) - F^{\star }$. Summing up this inequality from $i=1$ to $i=k$ and using $\tau _0 = 1$ and $\tilde{{x}}^0 ={x}^0$, we obtain

$$\begin{aligned} {}\frac{\gamma _{k+1}}{\tau _k^2}\Delta {F}_{k+1}\le & {} \sum _{i=1}^k\frac{\gamma _{i+1}}{\tau _i}\Delta {\hat{\ell }}_{\gamma _{i+1}}({x}) + \frac{\Vert {A}\Vert ^2}{2}\left( \Vert \tilde{{x}}^1 - {x}\Vert ^2 - \Vert \tilde{{x}}^{k+1} - {x}\Vert ^2\right) \nonumber \\&+\,\gamma _1\Delta {F}_1 + S_kD_{\mathcal {U}},{} \end{aligned}$$

(47)

where $S_k := \sum _{i=1}^ks_i$. Now, using again (46) with $k=1$, ${x}^0 = \tilde{{x}}^0$ and $\tau _0 = 1$, we get $\gamma _1\Delta {F}_1 \le \gamma _1\tau _0\Delta {\hat{\ell }}_{\gamma _1}(x) + \frac{\Vert {A}\Vert ^2}{2}\left( \Vert {x}^0 - {x}^{\star }\Vert ^2 - \Vert \tilde{{x}}^1 - {x}^{\star }\Vert ^2\right) $. Using this into (47), one yields

$$\begin{aligned} \frac{\gamma _{k+1}}{\tau _k^2}\Delta {F}_{k+1}\le & {} \sum _{i=0}^k\frac{\gamma _{i+1}}{\tau _i}\left( \big \langle {x}, {A}{u}^{*}_{\gamma _{i+1}}(\hat{{x}}^i) - {b}\big \rangle - \varphi ({u}^{*}_{\gamma _{i+1}}(\hat{{x}}^i)) + s_{\mathcal {K}}({x})- F^{\star }\right) \nonumber \\&+ \frac{\Vert {A}\Vert ^2}{2}\Vert {x}^0 - {x}\Vert ^2 + S_kD_{\mathcal {U}}. \end{aligned}$$

(48)

Combining $\bar{{u}}^k$ defined by (35) with $w_i :=\frac{\gamma _{i+1}}{\tau _i}$, and the convexity of $\varphi $, we have

$$\begin{aligned} \sum _{i=0}^k\frac{\gamma _{i+1}}{\tau _i}\left( \big \langle {x}, {A}{u}^{*}_{\gamma _{i+1}}(\hat{{x}}^i) - {b}\big \rangle - \varphi ({u}^{*}_{\gamma _{i+1}}(\hat{{x}}^i))\right) \le \varGamma _k\left( \big \langle {x}, {A}\bar{{u}}^k - {b}\big \rangle - \varphi (\bar{{u}}^k)\right) . \end{aligned}$$

Substituting this into (48) and then using $\Delta {F}_{k+1} \ge -\gamma _{k+1}D_{\mathcal {U}}$ we get

$$\begin{aligned} -\frac{\gamma _{k+1}^2}{\tau _k^2}D_{\mathcal {U}} \le \varGamma _k\left( \big \langle {x}, {A}\bar{{u}}^k - {b}\big \rangle - \varphi (\bar{{u}}^k) + s_{\mathcal {K}}({x}) - F^{\star }\right) + \frac{\Vert {A}\Vert ^2}{2}\Vert {x}^0 - {x}\Vert ^2 \!+\! S_kD_{\mathcal {U}}, \end{aligned}$$

which implies

$$\begin{aligned} F^{\star }\le \big \langle {x}, {A}\bar{{u}}^k - {b}\big \rangle - \varphi (\bar{{u}}^k) + s_{\mathcal {K}}({x}) + \frac{\Vert {A}\Vert ^2}{2\varGamma _k}\Vert {x}^0 - {x}\Vert ^2 + \frac{D_{\mathcal {U}}}{\varGamma _k}\left( \frac{\gamma _{k+1}^2}{\tau _k^2} + S_k\right) . \end{aligned}$$

By arranging this inequality, we get

$$\begin{aligned} \inf _{{r}\in \mathcal {K}}\big \langle {x}, {b}- {A}\bar{{u}}^k - {r}\big \rangle + \varphi (\bar{{u}}^k) \le -F^{\star }+ \frac{\Vert {A}\Vert ^2}{2\varGamma _k}\Vert {x}^0 - {x}\Vert ^2 + \frac{D_{\mathcal {U}}}{\varGamma _k}\left( \frac{\gamma _{k+1}^2}{\tau _k^2} + S_k\right) , \end{aligned}$$

(49)

where we use the relation $-s_{\mathcal {K}}({x}) = -\sup _{{r}\in \mathcal {K}}\langle {x}, {r}\rangle = \inf _{{r}\in \mathcal {K}}\langle {x}, -{r}\rangle $. On the other hand, by the saddle point theory for the primal and dual problems (33) and (34), for any optimal solution ${x}^{\star }$, we can show that

$$\begin{aligned} -F^{\star } = \varphi ^{\star } \le \varphi ({u}) - \langle {x}^{\star }, {A}{u}- {b}+ {r}\rangle , ~\forall {u}\in \mathcal {U},~{r}\in \mathcal {K}. \end{aligned}$$

Since this inequality holds for any ${r}\in \mathcal {K}$ and ${u}\in \mathcal {U}$, by using ${u}= \bar{{u}}^k$, it leads to

$$\begin{aligned} \inf _{{r}\in \mathcal {K}}\big \langle {x}^{\star }, {A}\bar{{u}}^k - {b}+ {r}\big \rangle - \varphi (\bar{{u}}^k) \le F^{\star }. \end{aligned}$$

(50)

Combining (49) and (50) yields

$$\begin{aligned}&\min _{{r}\in \mathcal {K}}\Big \{\big \langle {x}^{\star }- {x}, {r}+ {A}\bar{{u}}^k - {b}\big \rangle - \frac{\Vert {A}\Vert ^2}{2\varGamma _k}\Vert {x}^0 - {x}\Vert ^2\Big \} \le \frac{D_{\mathcal {U}}}{\varGamma _k}\left( \frac{\gamma _{k+1}^2}{\tau _k^2} + S_k\right) , ~~\forall {x}\in \mathbb {R}^p.\nonumber \\ \end{aligned}$$

(51)

Taking ${x}:= {x}^0 - \Vert {A}\Vert ^{-2}\varGamma _k({A}\bar{{u}}^k - {b}+ {r})$ for any ${r}\in \mathcal {K}$, we obtain from (51) that

$$\begin{aligned} \min _{{r}\in \mathcal {K}}\left\{ \frac{\varGamma _k}{\Vert {A}\Vert ^2}\Vert {A}\bar{{u}}^k + {r}- {b}\Vert ^2 + 2\big \langle {A}\bar{{u}}^k - {b}+ {r}, {x}^{\star }- {x}^0\big \rangle \right\} \le \frac{2D_{\mathcal {U}}}{\varGamma _k}\left( \frac{ \gamma _{k+1}^2}{\tau _k^2} + S_k\right) , \end{aligned}$$

which implies (by the Cauchy–Schwarz inequality)

$$\begin{aligned}&\min _{{r}\in \mathcal {K}}\left\{ \varGamma _k\Vert {A}\bar{{u}}^k - {b}+ {r}\Vert ^2 - 2\Vert {A}\Vert ^2\Vert {A}\bar{{u}}^k - {b}+ {r}\Vert \Vert {x}^{\star }- {x}^0\Vert \right\} \\&\quad \le \frac{2\Vert {A}\Vert ^2D_{\mathcal {U}}}{\varGamma _k}\left( \frac{\gamma _{k+1}^2}{\tau _k^2} + S_k\right) . \end{aligned}$$

By elementary calculations and $\mathrm {dist}\left( {b}- {A}\bar{{u}}^k, \mathcal {K}\right) = \inf \left\{ \Vert {A}\bar{{u}}^k-{b}+ {r}\Vert : {r}\in \mathcal {K}\right\} $, we can show from the last inequality that

$$\begin{aligned} \mathrm {dist}\left( {b}- {A}\bar{{u}}^k, \mathcal {K}\right) \le \frac{\Vert {A}\Vert ^2}{\varGamma _k}\Big [\Vert {x}^0 - {x}^{\star }\Vert + \sqrt{\Vert {x}^0 - {x}^{\star }\Vert ^2 + \frac{2}{\Vert {A}\Vert ^2}\Big ( S_k+\frac{\gamma _{k+1}^2}{\tau _k^2}\Big )D_{\mathcal {U}}}\Big ]. \end{aligned}$$

(52)

To prove the first estimate of (37), we use (49) with ${x}= \varvec{0}^p$ and $F^{\star } = -\varphi ^{\star }$ to get

$$\begin{aligned} \varphi (\bar{{u}}^k) - \varphi ^{\star } \le \frac{1}{\varGamma _k}\Big [\frac{\Vert {A}\Vert ^2}{2}\Vert {x}^0\Vert ^2 + \Big (\frac{\gamma _{k+1}^2}{\tau _k^2} + S_k\Big )D_{\mathcal {U}}\Big ]. \end{aligned}$$

(53)

Since we apply Algorithm 1(c) to solve the dual problem (34) using $b_{\mathcal {U}}$ such that $L_b = 1$, we have $S^k \le 2\gamma _1^2$. Then, by using $\gamma _{k+1} = \frac{\bar{c}\gamma _1}{k+\bar{c}}$, and $\tau _k := \frac{1}{k+\bar{c}}$, we can show that $\frac{\gamma _{k+1}^2}{\tau _k^2} = \gamma _1\bar{c}$. Moreover, we also have $\varGamma _k := \sum _{i=0}^k\frac{\gamma _{i+1}}{\tau _i} = \gamma _1\bar{c}(k+1)$. Using these estimates, and $S_k \le 2\gamma _1^2$ from (4) into (52) and (53) we obtain (37). For the left-hand side inequality in the first estimate of (37), we use a simple bound $-\Vert {x}^{\star }\Vert \mathrm {dist}\left( {b}- {A}{u},\mathcal {K}\right) \le \varphi ({u}) - \varphi ^{\star }$ for ${u}= \bar{{u}}^k \in \mathcal {U}$ from the saddle point theory as in (50). $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tran-Dinh, Q. Adaptive smoothing algorithms for nonsmooth composite convex minimization. Comput Optim Appl 66, 425–451 (2017). https://doi.org/10.1007/s10589-016-9873-6

Download citation

Received: 24 August 2015
Published: 15 September 2016
Issue Date: April 2017
DOI: https://doi.org/10.1007/s10589-016-9873-6

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive smoothing algorithms for nonsmooth composite convex minimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A class of exact penalty functions and penalty algorithms for nonsmooth constrained optimization problems

Solving Minimax Problems: Local Smoothing Versus Global Smoothing

Subgradient Smoothing Method for Nonsmooth Nonconvex Optimization

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: the proof of technical results

1.1 The proof of Lemma 2: descent property of the proximal gradient step

1.2 The proof of Lemma 3: key estimate

1.3 The proof of Lemma 4: the choice of parameters

1.4 The proof of Corollary 1: the smooth accelerated gradient method

1.5 The proof of Theorem 2: primal solution recovery

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

Adaptive smoothing algorithms for nonsmooth composite convex minimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A class of exact penalty functions and penalty algorithms for nonsmooth constrained optimization problems

Solving Minimax Problems: Local Smoothing Versus Global Smoothing

Subgradient Smoothing Method for Nonsmooth Nonconvex Optimization

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: the proof of technical results

Appendix: the proof of technical results

1.1 The proof of Lemma 2: descent property of the proximal gradient step

1.2 The proof of Lemma 3: key estimate

1.3 The proof of Lemma 4: the choice of parameters

1.4 The proof of Corollary 1: the smooth accelerated gradient method

1.5 The proof of Theorem 2: primal solution recovery

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation