Abstract
This paper proposes an efficient adaptive variant of a quadratic penalty accelerated inexact proximal point (QP-AIPP) method proposed earlier by the authors. Both the QP-AIPP method and its variant solve linearly set constrained nonconvex composite optimization problems using a quadratic penalty approach where the generated penalized subproblems are solved by a variant of the underlying AIPP method. The variant, in turn, solves a given penalized subproblem by generating a sequence of proximal subproblems which are then solved by an accelerated composite gradient algorithm. The main difference between AIPP and its variant is that the proximal subproblems in the former are always convex while the ones in the latter are not necessarily convex due to the fact that their prox parameters are chosen as aggressively as possible so as to improve efficiency. The possibly nonconvex proximal subproblems generated by the AIPP variant are also tentatively solved by a novel adaptive accelerated composite gradient algorithm based on the validity of some key convergence inequalities. As a result, the variant generates a sequence of proximal subproblems where the stepsizes are adaptively changed according to the responses obtained from the calls to the accelerated composite gradient algorithm. Finally, numerical results are given to demonstrate the efficiency of the proposed AIPP and QP-AIPP variants.
Similar content being viewed by others
Notes
See the MovieLens 100K dataset containing 610 users and 9724 movies, which is found in https://grouplens.org/datasets/movielens/.
References
Amir, B.: First-Order Methods in Optimization, vol. 25. SIAM, Philadelphia (2017)
Aybat, N.S., Iyengar, G.: A first-order smoothed penalty method for compressed sensing. SIAM J. Optim. 21(1), 287–313 (2011)
Aybat, N.S., Iyengar, G.: A first-order augmented Lagrangian method for compressed sensing. SIAM J. Optim. 22(2), 429–459 (2012)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: Accelerated methods for nonconvex optimization. SIAM J. Optim. 28(2), 1751–1772 (2018)
Cartis, C., Gould, N., Toint, P.: On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization problems. SIAM J. Optim. 20(6), 2833–2852 (2010)
Chen, Y., Lan, G., Ouyang, Y.: Optimal primal-dual methods for a class of saddle point problems. SIAM J. Optim. 24(4), 1779–1814 (2014)
Drusvyatskiy, D., Paquette, C.: Efficiency of minimizing compositions of convex functions and smooth maps. Math. Program. 178, 1–56 (2018)
Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156, 59–99 (2016)
Ghadimi, S., Lan, G., Zhang, H.: Generalized uniformly optimal methods for nonlinear programming. J. Sci. Comput. 79(3), 1854–1881 (2019)
Gu, Q., Wang, Z., Liu, H.: Sparse pca with oracle property. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 1529–1537. Curran Associates Inc, Red Hook (2014)
He, Y., Monteiro, R.D.C.: Accelerating block-decomposition first-order methods for solving composite saddle-point and two-player Nash equilibrium problems. SIAM J. Optim. 25(4), 2182–2211 (2015)
He, Y., Monteiro, R.D.C.: An accelerated HPE-type algorithm for a class of composite convex-concave saddle-point problems. SIAM J. Optim. 26(1), 29–56 (2016)
Kolossoski, O., Monteiro, R.D.C.: An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems. Optim. Methods Softw. 32(6), 1244–1272 (2017)
Kong, W., Melo, J.G., Monteiro, R.D.C.: Complexity of a quadratic penalty accelerated inexact proximal point method for solving linearly constrained nonconvex composite programs. SIAM J. Optim. 29(4), 2566–2593 (2019)
Lan, G., Monteiro, R.D.C.: Iteration-complexity of first-order penalty methods for convex programming. Manuscript, School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30332 (2008)
Lan, G., Monteiro, R.D.C.: Iteration-complexity of first-order penalty methods for convex programming. Math. Program. 138(1), 115–139 (2013)
Lan, G., Monteiro, R.D.C.: Iteration-complexity of first-order augmented Lagrangian methods for convex programming. Math. Program. 155(1), 511–547 (2016)
Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. Adv. Neural Inf. Process. Syst. 28, 379–387 (2015)
Liang, J., Monteiro, R.D.C., Sim, C.K.: A fista-type accelerated gradient algorithm for solving smooth nonconvex composite optimization problems. ar**v preprint ar**v:1905.07010 (2019)
Liu, Y.F., Liu, X., Ma, S.: On the nonergodic convergence rate of an inexact augmented lagrangian framework for composite convex programming. Math. Oper. Res. 44(2), 632–650 (2019)
Lu, Z., Zhou, Z.: Iteration-complexity of first-order augmented Lagrangian methods for convex conic programming. Available on ar**v:1803.09941 (2018)
Monteiro, R.D.C., Ortiz, C., Svaiter, B.F.: An adaptive accelerated first-order method for convex optimization. Comput. Optim. Appl. 64, 31–73 (2016)
Monteiro, R.D.C., Svaiter, B.F.: Iteration-complexity of a Newton proximal extragradient method for monotone variational inequalities and inclusion problems. SIAM J. Optim. 22(3), 914–935 (2012)
Monteiro, R.D.C., Svaiter, B.F.: An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM J. Optim. 23(2), 1092–1125 (2013)
Necoara, I., Patrascu, A., Glineur, F.: Complexity of first-order inexact Lagrangian and penalty methods for conic convex programming. Optim. Methods Softw. 34, 1–31 (2017)
Nesterov, Y.E.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publ, Boston (2004)
Nesterov, Y.E., Polyak, B.T.: Cubic regularization of newton method and its global performance. Math. Program. 108(1), 177–205 (2006)
Paquette, C., Lin, H., Drusvyatskiy, D., Mairal, J., Harchaoui, Z.: Catalyst for gradient-based nonconvex optimization. In: AISTATS 2018-21st International Conference on Artificial Intelligence and Statistics, pp 1–10 (2018)
Patrascu, A., Necoara, I., Tran-Dinh, Q.: Adaptive inexact fast augmented Lagrangian methods for constrained convex optimization. Optim. Lett. 11(3), 609–626 (2017)
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)
Solodov, M.V., Svaiter, B.F.: A hybrid approximate extragradient-proximal point algorithm using the enlargement of a maximal monotone operator. Set-Valued Var. Anal. 7(4), 323–345 (1999)
Tran-Dinh, Q.: Proximal alternating penalty algorithms for nonsmooth constrained convex optimization. Comput. Optim. Appl. 72(1), 1–43 (2019)
Xu, Y.: Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming. Available on ar**v:1711.05812 (2017)
Yao, Q., Kwok, J.T.: Efficient learning with a family of nonconvex regularizers by redistributing nonconvexity. J. Mach. Learn. Res. 18, 179–1 (2017)
Acknowledgements
Weiwei Kong: This author acknowledges the support of the Natural Sciences and Engineering Research Council of Canada (NSERC), [funding reference number PGSD3-516700-2018], and the partial support of ONR Grant N00014-18-1-2077. Renato D.C. Monteiro: The works of this author were partially supported by ONR Grant N00014-18-1-2077. Jefferson G. Melo: The works of this author were paritally supported by CNPq grant 312559/2019-4 and FAPEG/GO.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
This appendix contains proofs and statements of several technical results used in the main body of the paper. It contains three subsections. The first subsection consists of proofs about the refinement procedure of Sect. 2; the second subsection consists of proofs about the R-ACG algorithm of Sect. 3; and the third subsection consists of technical results related to Sect. 5.
1.1 Properties of the refinement procedure
Proof of Proposition 2.1
It follows from [15, Lemma 19] with \((f,h,L)=(f_{{\lambda }},h_{{\lambda }},M_{{\lambda }})\) that \(\varDelta \ge 0\) and
Dividing by \({\lambda }\) and rearranging terms yields
Adding \(\nabla g({\hat{z}})\) to both sides and using the definition of \({\hat{v}}\) gives
which is the inclusion in (2.8).
We now bound \({\lambda }\Vert {\hat{v}}\Vert\). Since [15, Lemma 19] implies that \(\Vert z-{\hat{z}}\Vert \le \sqrt{2M_{{\lambda }}^{-1}\varDelta }\) and \(\nabla g\) is M–Lipschitz continuous then
which is the inequality in (2.8). \(\square\)
1.2 Properties of the R-ACG algorithm
Proof of Proposition 3.2(a)
Let \(\ell\) denote the quantity in (3.15). Assume that the R-ACG algorithm has performed \(\ell\)-iterations without declaring failure. In view of step 2 of the R-ACG algorithm, it follows that both (3.10) and (3.11) hold for every \(1\le j\le \ell\). We will show that it must stop successfully at the end of the \(\ell ^{\mathrm{th}}\) iteration, and hence that the conclusion of the lemma holds. Indeed, note that (3.14), (3.15), and the fact that \(\log (1+t) \ge t/2\) for all \(t\in [0,1]\) implies that
Combining the triangle inequality, (3.10), the fact that \(2/A_\ell \le 1/C\) and \((2/A_\ell )^2< 2/A_\ell < 1\) from (A.1), and the relation \((a+b)^2\le 2(a^2+b^2)\) for all \(a, b \in \mathfrak {R}\), we obtain
On the other hand, using the triangle inequality and the fact that \((a+b)^2\le (1+s)a^2+(1+1/s)b^2\) for every \((a,b,s)\in \mathfrak {R}\times \mathfrak {R}\times R_{++}\) (under the choice of \(s=1/(\sqrt{C}-1)\)), we obtain
Combining the previous estimates, we then conclude that
which, after a simple algebraic manipulation, easily implies that
Using the first term in the maximum of (3.16) together with the second inequality of (A.3) immediately implies that (3.12) holds with \(j=\ell\). To show that (3.13) holds at \(j=\ell\), observe that the definition of \(\psi\) in (3.3), (3.11) with \(j=\ell\), the second inequality of (A.3), and the second term in the maximum of (3.16) imply that
\(\square\)
1.3 Results related to Sect. 5
Lemma A.1
Assume that\(f, h:\mathfrak {R}^{n}\mapsto (-\infty ,\infty ]\)satisfy assumptions (C1) and (C3) in Sect. 5, and that, in addition,fis lower semicontinuous on\(\mathrm {cl}\,(\mathrm {dom}\,h)\). Then,\(\varphi := f+h\)is a proper lower semicontinuous function which has a global minimum over\(\mathfrak {R}^n\).
Proof
Suppose \(\bar{z}\in \mathfrak {R}^{n}\backslash \mathrm {cl}\,(\mathrm {dom}\,h)\). Since \(\mathrm {cl}\,(\mathrm {dom}\,h)\) is closed, there exists \(\varepsilon >0\) such that \(h(u) = \infty\) for every \(u \in \mathfrak {R}^{n}\backslash \mathrm {cl}\,(\mathrm {dom}\,h)\) satisfying \(\Vert u-{\bar{z}}\Vert < \varepsilon\). Hence, \(\liminf _{u\rightarrow \bar{z}}\varphi (u)=\infty =\varphi (\bar{z})\). Now suppose \(\bar{z}\in \mathrm {cl}\,(\mathrm {dom}\,h)\). By the lower semicontinuity of f and h we have
and, since f is differentiable on \(\mathrm {dom}\,h\), the function \(\varphi\) is proper lower semicontinuous with \(\mathrm {dom}\,\varphi = \mathrm {dom}\,h\). The last statement of the lemma follows from the well known fact that infimum of a lower semicontinuous function over a bounded set, namely, \(\mathrm {dom}\,\varphi\), is always attained. \(\square\)
1.4 Comparison with the AIPP method
This subsection presents some computational results that compare the AIPP method of [15] with the R-AIPPc method described at the beginning of Sect. 6. The main problem of interest for this sub-subsection is the quadratic matrix problem described in Sect. 6.1.1.
We now describe the particular implementation of the AIPP method used in this sub-subsection, which differs from its description in [15] in two ways. First, its innermost subroutine, namely, the ACG method, stops immediately when a quadruple \(({\lambda }_k, z_k, v_k, \varepsilon _k)\) satisfying (2.14) is found. Second, for each iteration k of the method, a triple \(({{\hat{z}}}, {{\hat{v}}}, \varDelta )\) is generated from the refinement procedure in Section 2 by assigning \(({{\hat{z}}}, {{\hat{v}}}, \varDelta ) = RP({\lambda }_k, z_{k-1}, z_k, v_k)\), and the method stops with the desired output when \({{\hat{v}}}\) satisfies condition (6.1).
All experiment parameters for the R-AIPPc method and the problem instances are as described in Sect. 6.1.1 below, while the AIPP uses a parameter input of \((\sigma ,{\lambda })=(0.3, 1/(2m))\) for its results.
We now present the numerical tables for this set of problem instances (Table 16).
Rights and permissions
About this article
Cite this article
Kong, W., Melo, J.G. & Monteiro, R.D.C. An efficient adaptive accelerated inexact proximal point method for solving linearly constrained nonconvex composite problems. Comput Optim Appl 76, 305–346 (2020). https://doi.org/10.1007/s10589-020-00188-w
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-020-00188-w
Keywords
- Quadratic penalty method
- Nonconvex program
- Iteration-complexity
- Proximal point method
- First-order accelerated methods