An efficient adaptive accelerated inexact proximal point method for solving linearly constrained nonconvex composite problems

Kong, Weiwei; Melo, Jefferson G.; Monteiro, Renato D. C.

doi:10.1007/s10589-020-00188-w

An efficient adaptive accelerated inexact proximal point method for solving linearly constrained nonconvex composite problems

Published: 25 April 2020

Volume 76, pages 305–346, (2020)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

Weiwei Kong¹,
Jefferson G. Melo² &
Renato D. C. Monteiro¹

532 Accesses
Explore all metrics

Abstract

This paper proposes an efficient adaptive variant of a quadratic penalty accelerated inexact proximal point (QP-AIPP) method proposed earlier by the authors. Both the QP-AIPP method and its variant solve linearly set constrained nonconvex composite optimization problems using a quadratic penalty approach where the generated penalized subproblems are solved by a variant of the underlying AIPP method. The variant, in turn, solves a given penalized subproblem by generating a sequence of proximal subproblems which are then solved by an accelerated composite gradient algorithm. The main difference between AIPP and its variant is that the proximal subproblems in the former are always convex while the ones in the latter are not necessarily convex due to the fact that their prox parameters are chosen as aggressively as possible so as to improve efficiency. The possibly nonconvex proximal subproblems generated by the AIPP variant are also tentatively solved by a novel adaptive accelerated composite gradient algorithm based on the validity of some key convergence inequalities. As a result, the variant generates a sequence of proximal subproblems where the stepsizes are adaptively changed according to the responses obtained from the calls to the accelerated composite gradient algorithm. Finally, numerical results are given to demonstrate the efficiency of the proposed AIPP and QP-AIPP variants.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Primal and dual mixed-integer least-squares: distributional statistics and global algorithm

Article Open access 24 June 2024

The Big-M method with the numerical infinite M

Article Open access 17 September 2020

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

Article 07 June 2018

Notes

See the MovieLens 100K dataset containing 610 users and 9724 movies, which is found in https://grouplens.org/datasets/movielens/.

References

Amir, B.: First-Order Methods in Optimization, vol. 25. SIAM, Philadelphia (2017)
MATH Google Scholar
Aybat, N.S., Iyengar, G.: A first-order smoothed penalty method for compressed sensing. SIAM J. Optim. 21(1), 287–313 (2011)
Article MathSciNet Google Scholar
Aybat, N.S., Iyengar, G.: A first-order augmented Lagrangian method for compressed sensing. SIAM J. Optim. 22(2), 429–459 (2012)
Article MathSciNet Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet Google Scholar
Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: Accelerated methods for nonconvex optimization. SIAM J. Optim. 28(2), 1751–1772 (2018)
Article MathSciNet Google Scholar
Cartis, C., Gould, N., Toint, P.: On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization problems. SIAM J. Optim. 20(6), 2833–2852 (2010)
Article MathSciNet Google Scholar
Chen, Y., Lan, G., Ouyang, Y.: Optimal primal-dual methods for a class of saddle point problems. SIAM J. Optim. 24(4), 1779–1814 (2014)
Article MathSciNet Google Scholar
Drusvyatskiy, D., Paquette, C.: Efficiency of minimizing compositions of convex functions and smooth maps. Math. Program. 178, 1–56 (2018)
MathSciNet MATH Google Scholar
Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156, 59–99 (2016)
Article MathSciNet Google Scholar
Ghadimi, S., Lan, G., Zhang, H.: Generalized uniformly optimal methods for nonlinear programming. J. Sci. Comput. 79(3), 1854–1881 (2019)
Article MathSciNet Google Scholar
Gu, Q., Wang, Z., Liu, H.: Sparse pca with oracle property. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 1529–1537. Curran Associates Inc, Red Hook (2014)
Google Scholar
He, Y., Monteiro, R.D.C.: Accelerating block-decomposition first-order methods for solving composite saddle-point and two-player Nash equilibrium problems. SIAM J. Optim. 25(4), 2182–2211 (2015)
Article MathSciNet Google Scholar
He, Y., Monteiro, R.D.C.: An accelerated HPE-type algorithm for a class of composite convex-concave saddle-point problems. SIAM J. Optim. 26(1), 29–56 (2016)
Article MathSciNet Google Scholar
Kolossoski, O., Monteiro, R.D.C.: An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems. Optim. Methods Softw. 32(6), 1244–1272 (2017)
Article MathSciNet Google Scholar
Kong, W., Melo, J.G., Monteiro, R.D.C.: Complexity of a quadratic penalty accelerated inexact proximal point method for solving linearly constrained nonconvex composite programs. SIAM J. Optim. 29(4), 2566–2593 (2019)
Article MathSciNet Google Scholar
Lan, G., Monteiro, R.D.C.: Iteration-complexity of first-order penalty methods for convex programming. Manuscript, School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30332 (2008)
Lan, G., Monteiro, R.D.C.: Iteration-complexity of first-order penalty methods for convex programming. Math. Program. 138(1), 115–139 (2013)
Article MathSciNet Google Scholar
Lan, G., Monteiro, R.D.C.: Iteration-complexity of first-order augmented Lagrangian methods for convex programming. Math. Program. 155(1), 511–547 (2016)
Article MathSciNet Google Scholar
Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. Adv. Neural Inf. Process. Syst. 28, 379–387 (2015)
Google Scholar
Liang, J., Monteiro, R.D.C., Sim, C.K.: A fista-type accelerated gradient algorithm for solving smooth nonconvex composite optimization problems. ar**v preprint ar**v:1905.07010 (2019)
Liu, Y.F., Liu, X., Ma, S.: On the nonergodic convergence rate of an inexact augmented lagrangian framework for composite convex programming. Math. Oper. Res. 44(2), 632–650 (2019)
Article MathSciNet Google Scholar
Lu, Z., Zhou, Z.: Iteration-complexity of first-order augmented Lagrangian methods for convex conic programming. Available on ar**v:1803.09941 (2018)
Monteiro, R.D.C., Ortiz, C., Svaiter, B.F.: An adaptive accelerated first-order method for convex optimization. Comput. Optim. Appl. 64, 31–73 (2016)
Article MathSciNet Google Scholar
Monteiro, R.D.C., Svaiter, B.F.: Iteration-complexity of a Newton proximal extragradient method for monotone variational inequalities and inclusion problems. SIAM J. Optim. 22(3), 914–935 (2012)
Article MathSciNet Google Scholar
Monteiro, R.D.C., Svaiter, B.F.: An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM J. Optim. 23(2), 1092–1125 (2013)
Article MathSciNet Google Scholar
Necoara, I., Patrascu, A., Glineur, F.: Complexity of first-order inexact Lagrangian and penalty methods for conic convex programming. Optim. Methods Softw. 34, 1–31 (2017)
MathSciNet MATH Google Scholar
Nesterov, Y.E.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publ, Boston (2004)
Book Google Scholar
Nesterov, Y.E., Polyak, B.T.: Cubic regularization of newton method and its global performance. Math. Program. 108(1), 177–205 (2006)
Article MathSciNet Google Scholar
Paquette, C., Lin, H., Drusvyatskiy, D., Mairal, J., Harchaoui, Z.: Catalyst for gradient-based nonconvex optimization. In: AISTATS 2018-21st International Conference on Artificial Intelligence and Statistics, pp 1–10 (2018)
Patrascu, A., Necoara, I., Tran-Dinh, Q.: Adaptive inexact fast augmented Lagrangian methods for constrained convex optimization. Optim. Lett. 11(3), 609–626 (2017)
Article MathSciNet Google Scholar
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)
Article MathSciNet Google Scholar
Solodov, M.V., Svaiter, B.F.: A hybrid approximate extragradient-proximal point algorithm using the enlargement of a maximal monotone operator. Set-Valued Var. Anal. 7(4), 323–345 (1999)
Article MathSciNet Google Scholar
Tran-Dinh, Q.: Proximal alternating penalty algorithms for nonsmooth constrained convex optimization. Comput. Optim. Appl. 72(1), 1–43 (2019)
Article MathSciNet Google Scholar
Xu, Y.: Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming. Available on ar**v:1711.05812 (2017)
Yao, Q., Kwok, J.T.: Efficient learning with a family of nonconvex regularizers by redistributing nonconvexity. J. Mach. Learn. Res. 18, 179–1 (2017)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

Weiwei Kong: This author acknowledges the support of the Natural Sciences and Engineering Research Council of Canada (NSERC), [funding reference number PGSD3-516700-2018], and the partial support of ONR Grant N00014-18-1-2077. Renato D.C. Monteiro: The works of this author were partially supported by ONR Grant N00014-18-1-2077. Jefferson G. Melo: The works of this author were paritally supported by CNPq grant 312559/2019-4 and FAPEG/GO.

Author information

Authors and Affiliations

School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30332-0205, USA
Weiwei Kong & Renato D. C. Monteiro
Institute of Mathematics and Statistics, Federal University of Goias, Campus II, Caixa Postal 131, Goiânia, GO, CEP 74001-970, Brazil
Jefferson G. Melo

Authors

Weiwei Kong
View author publications
You can also search for this author in PubMed Google Scholar
Jefferson G. Melo
View author publications
You can also search for this author in PubMed Google Scholar
Renato D. C. Monteiro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weiwei Kong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

This appendix contains proofs and statements of several technical results used in the main body of the paper. It contains three subsections. The first subsection consists of proofs about the refinement procedure of Sect. 2; the second subsection consists of proofs about the R-ACG algorithm of Sect. 3; and the third subsection consists of technical results related to Sect. 5.

1.1 Properties of the refinement procedure

Proof of Proposition 2.1

It follows from [15, Lemma 19] with $(f,h,L)=(f_{{\lambda }},h_{{\lambda }},M_{{\lambda }})$ that $\varDelta \ge 0$ and

$$\begin{aligned} M_{{\lambda }}(z-{\hat{z}})\in \nabla f_{{\lambda }}(z)+\partial h_{{\lambda }}({\hat{z}})={\lambda }\nabla g(z)+(z-z^{-}-v)+{\lambda }\partial h({\hat{z}}). \end{aligned}$$

Dividing by ${\lambda }$ and rearranging terms yields

$$\begin{aligned} \frac{1}{{\lambda }}\left[ M_{{\lambda }}(z-{\hat{z}})+(v+z^{-}-z)\right] -\nabla g(z)\in \partial h({\hat{z}}). \end{aligned}$$

Adding $\nabla g({\hat{z}})$ to both sides and using the definition of ${\hat{v}}$ gives

$$\begin{aligned} {\hat{v}}=\frac{1}{{\lambda }}\left[ M_{{\lambda }}(z-{\hat{z}})+(v+z^{-}-z)\right] +\nabla g({\hat{z}})-\nabla g(z)\in \nabla g({\hat{z}})+\partial h({\hat{z}}), \end{aligned}$$

which is the inclusion in (2.8).

We now bound ${\lambda }\Vert {\hat{v}}\Vert$. Since [15, Lemma 19] implies that $\Vert z-{\hat{z}}\Vert \le \sqrt{2M_{{\lambda }}^{-1}\varDelta }$ and $\nabla g$ is M–Lipschitz continuous then

$$\begin{aligned} {\lambda }\Vert {\hat{v}}\Vert&\le \Vert M_{{\lambda }}(z-{\hat{z}})\Vert +\Vert v+z^{-}-z\Vert +{\lambda }\Vert \nabla g({\hat{z}})-\nabla g(z)\Vert \\&\le \sqrt{2M_{{\lambda }}\varDelta }+\Vert v+z^{-}-z\Vert +{\lambda }M\Vert {\hat{z}}-z\Vert \le \sqrt{2M_{{\lambda }}\varDelta }+\Vert v+z^{-}-z\Vert +M_{{\lambda }}\Vert {\hat{z}}-z\Vert \\&\le \sqrt{2M_{{\lambda }}\varDelta }+\Vert v+z^{-}-z\Vert +M_{{\lambda }}\sqrt{2M_{{\lambda }}^{-1}\varDelta } =\Vert v+z^{-}-z\Vert +2\sqrt{2M_{{\lambda }}\varDelta }, \end{aligned}$$

which is the inequality in (2.8). $\square$

1.2 Properties of the R-ACG algorithm

Proof of Proposition 3.2(a)

Let $\ell$ denote the quantity in (3.15). Assume that the R-ACG algorithm has performed $\ell$-iterations without declaring failure. In view of step 2 of the R-ACG algorithm, it follows that both (3.10) and (3.11) hold for every $1\le j\le \ell$. We will show that it must stop successfully at the end of the $\ell ^{\mathrm{th}}$ iteration, and hence that the conclusion of the lemma holds. Indeed, note that (3.14), (3.15), and the fact that $\log (1+t) \ge t/2$ for all $t\in [0,1]$ implies that

$$\begin{aligned} A_\ell \ge \frac{2}{1+ 2{{\widetilde{M}}}}\left( 1+ \frac{1}{2} \sqrt{\frac{1}{1+2{{\widetilde{M}}}}}\right) ^{2(\ell -1)}\ge 2C > 2. \end{aligned}$$

(A.1)

Combining the triangle inequality, (3.10), the fact that $2/A_\ell \le 1/C$ and $(2/A_\ell )^2< 2/A_\ell < 1$ from (A.1), and the relation $(a+b)^2\le 2(a^2+b^2)$ for all $a, b \in \mathfrak {R}$, we obtain

$$\begin{aligned} \Vert u_\ell \Vert ^2+2\eta _\ell&\le \max \{1/A_\ell ^2,1/(2A_\ell )\}(\Vert A_\ell u_\ell \Vert ^{2}+4A_\ell \eta _\ell )\\&\le \max \{1/A_\ell ^2,1/(2A_\ell )\}(2\Vert A_\ell u_\ell +x_\ell -x_0\Vert ^{2} +2\Vert x_\ell -x_0\Vert ^{2}+4A_\ell \eta _\ell )\\&\le \max \{(2/A_\ell )^2,2/A_\ell \}\Vert x_\ell -x_0\Vert ^{2} \le \frac{1}{C}\Vert x_\ell -x_0\Vert ^{2}. \end{aligned}$$

On the other hand, using the triangle inequality and the fact that $(a+b)^2\le (1+s)a^2+(1+1/s)b^2$ for every $(a,b,s)\in \mathfrak {R}\times \mathfrak {R}\times R_{++}$ (under the choice of $s=1/(\sqrt{C}-1)$), we obtain

$$\begin{aligned} \Vert x_\ell -x_0\Vert ^{2}\le \frac{\sqrt{C}}{\sqrt{C}-1}\Vert x_0-x_\ell +u_\ell \Vert ^2 +\sqrt{C}\Vert u_\ell \Vert ^2. \end{aligned}$$

Combining the previous estimates, we then conclude that

$$\begin{aligned} \Vert u_\ell \Vert ^2+2\eta _\ell \le \frac{1}{C-\sqrt{C}}\Vert x_0-x_\ell +u_\ell \Vert ^2 + \frac{1}{\sqrt{C}}\Vert u_\ell \Vert ^2, \end{aligned}$$

(A.2)

which, after a simple algebraic manipulation, easily implies that

$$\begin{aligned} \frac{1}{\sqrt{C}-1}\Vert x_{0}-x_{\ell }+u_{\ell }\Vert ^{2} \ge 2\sqrt{C}\eta _{\ell }+\left( \sqrt{C}-1\right) \Vert u_{\ell }\Vert ^{2} \ge \left( \sqrt{C}-1\right) \left( \Vert u_{\ell }\Vert ^{2} + 2\eta _{\ell } \right) . \end{aligned}$$

(A.3)

Using the first term in the maximum of (3.16) together with the second inequality of (A.3) immediately implies that (3.12) holds with $j=\ell$. To show that (3.13) holds at $j=\ell$, observe that the definition of $\psi$ in (3.3), (3.11) with $j=\ell$, the second inequality of (A.3), and the second term in the maximum of (3.16) imply that

$$\begin{aligned} \widetilde{\phi }(x_{0})-\widetilde{\phi }(x_{\ell })&\ge \left\langle u_{\ell },x_{0}-x_{\ell }\right\rangle +\eta _{\ell }+\frac{1}{2}\Vert x_{\ell }-x_{0}\Vert ^{2} = \frac{1}{2}\left[ \Vert x_{0}-x_{\ell }+u_{\ell }\Vert ^{2}-\left( \Vert u_{\ell }\Vert ^{2} +2\eta _{\ell }\right) \right] \\&\ge \frac{1}{2} \left[ 1 + \left( \sqrt{C}-1\right) ^{-2}\right] \Vert x_{0} -x_{\ell }+u_{\ell }\Vert ^{2} \ge \frac{1}{\theta }\Vert x_{0}-x_{\ell }+u_{\ell }\Vert ^{2}. \end{aligned}$$

$\square$

1.3 Results related to Sect. 5

Lemma A.1

Assume that$f, h:\mathfrak {R}^{n}\mapsto (-\infty ,\infty ]$satisfy assumptions (C1) and (C3) in Sect. 5, and that, in addition,fis lower semicontinuous on$\mathrm {cl}\,(\mathrm {dom}\,h)$. Then,$\varphi := f+h$is a proper lower semicontinuous function which has a global minimum over$\mathfrak {R}^n$.

Proof

Suppose $\bar{z}\in \mathfrak {R}^{n}\backslash \mathrm {cl}\,(\mathrm {dom}\,h)$. Since $\mathrm {cl}\,(\mathrm {dom}\,h)$ is closed, there exists $\varepsilon >0$ such that $h(u) = \infty$ for every $u \in \mathfrak {R}^{n}\backslash \mathrm {cl}\,(\mathrm {dom}\,h)$ satisfying $\Vert u-{\bar{z}}\Vert < \varepsilon$. Hence, $\liminf _{u\rightarrow \bar{z}}\varphi (u)=\infty =\varphi (\bar{z})$. Now suppose $\bar{z}\in \mathrm {cl}\,(\mathrm {dom}\,h)$. By the lower semicontinuity of f and h we have

$$\begin{aligned} \liminf _{u\rightarrow \bar{z}}\varphi (u)\ge \liminf _{u\rightarrow \bar{z}}f(u)+\liminf _{u\rightarrow \bar{z}}h(u)\ge f(\bar{z})+h(\bar{z})=\varphi (\bar{z}) \end{aligned}$$

and, since f is differentiable on $\mathrm {dom}\,h$, the function $\varphi$ is proper lower semicontinuous with $\mathrm {dom}\,\varphi = \mathrm {dom}\,h$. The last statement of the lemma follows from the well known fact that infimum of a lower semicontinuous function over a bounded set, namely, $\mathrm {dom}\,\varphi$, is always attained. $\square$

1.4 Comparison with the AIPP method

This subsection presents some computational results that compare the AIPP method of [15] with the R-AIPPc method described at the beginning of Sect. 6. The main problem of interest for this sub-subsection is the quadratic matrix problem described in Sect. 6.1.1.

We now describe the particular implementation of the AIPP method used in this sub-subsection, which differs from its description in [15] in two ways. First, its innermost subroutine, namely, the ACG method, stops immediately when a quadruple $({\lambda }_k, z_k, v_k, \varepsilon _k)$ satisfying (2.14) is found. Second, for each iteration k of the method, a triple $({{\hat{z}}}, {{\hat{v}}}, \varDelta )$ is generated from the refinement procedure in Section 2 by assigning $({{\hat{z}}}, {{\hat{v}}}, \varDelta ) = RP({\lambda }_k, z_{k-1}, z_k, v_k)$, and the method stops with the desired output when ${{\hat{v}}}$ satisfies condition (6.1).

All experiment parameters for the R-AIPPc method and the problem instances are as described in Sect. 6.1.1 below, while the AIPP uses a parameter input of $(\sigma ,{\lambda })=(0.3, 1/(2m))$ for its results.

We now present the numerical tables for this set of problem instances (Table 16).

Table 16 Iteration counts for QM problems with fixed m

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kong, W., Melo, J.G. & Monteiro, R.D.C. An efficient adaptive accelerated inexact proximal point method for solving linearly constrained nonconvex composite problems. Comput Optim Appl 76, 305–346 (2020). https://doi.org/10.1007/s10589-020-00188-w

Download citation

Received: 20 December 2018
Published: 25 April 2020
Issue Date: June 2020
DOI: https://doi.org/10.1007/s10589-020-00188-w

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

An efficient adaptive accelerated inexact proximal point method for solving linearly constrained nonconvex composite problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Primal and dual mixed-integer least-squares: distributional statistics and global algorithm

The Big-M method with the numerical infinite M

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 Properties of the refinement procedure

Proof of Proposition 2.1

1.2 Properties of the R-ACG algorithm

Proof of Proposition 3.2(a)

1.3 Results related to Sect. 5

Lemma A.1

Proof

1.4 Comparison with the AIPP method

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

An efficient adaptive accelerated inexact proximal point method for solving linearly constrained nonconvex composite problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Primal and dual mixed-integer least-squares: distributional statistics and global algorithm

The Big-M method with the numerical infinite M

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Properties of the refinement procedure

Proof of Proposition 2.1

1.2 Properties of the R-ACG algorithm

Proof of Proposition 3.2(a)

1.3 Results related to Sect. 5

Lemma A.1

Proof

1.4 Comparison with the AIPP method

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation