Abstract
Shape-constrained convex regression problem deals with fitting a convex function to the observed data, where additional constraints are imposed, such as component-wise monotonicity and uniform Lipschitz continuity. This paper provides a unified framework for computing the least squares estimator of a multivariate shape-constrained convex regression function in \({\mathbb {R}}^d\). We prove that the least squares estimator is computable via solving an essentially constrained convex quadratic programming (QP) problem with \((d+1)n\) variables, \(n(n-1)\) linear inequality constraints and n possibly non-polyhedral inequality constraints, where n is the number of data points. To efficiently solve the generally very large-scale convex QP, we design a proximal augmented Lagrangian method (proxALM) whose subproblems are solved by the semismooth Newton method. To further accelerate the computation when n is huge, we design a practical implementation of the constraint generation method such that each reduced problem is efficiently solved by our proposed proxALM. Comprehensive numerical experiments, including those in the pricing of basket options and estimation of production functions in economics, demonstrate that our proposed proxALM outperforms the state-of-the-art algorithms, and the proposed acceleration technique further shortens the computation time by a large margin.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12532-021-00210-0/MediaObjects/12532_2021_210_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12532-021-00210-0/MediaObjects/12532_2021_210_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12532-021-00210-0/MediaObjects/12532_2021_210_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12532-021-00210-0/MediaObjects/12532_2021_210_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12532-021-00210-0/MediaObjects/12532_2021_210_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12532-021-00210-0/MediaObjects/12532_2021_210_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12532-021-00210-0/MediaObjects/12532_2021_210_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12532-021-00210-0/MediaObjects/12532_2021_210_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12532-021-00210-0/MediaObjects/12532_2021_210_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12532-021-00210-0/MediaObjects/12532_2021_210_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12532-021-00210-0/MediaObjects/12532_2021_210_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12532-021-00210-0/MediaObjects/12532_2021_210_Fig12_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12532-021-00210-0/MediaObjects/12532_2021_210_Fig13_HTML.png)
Similar content being viewed by others
Data availability statement
This manuscript has no associated data or the data will not be deposited. The DOI of our code is https://doi.org/10.5281/zenodo.5543733 . In our code, we test our algorithms on synthetic datasets. For the real datasets tested in the paper, we provide the links in Readme.txt in our code.
Notes
Strictly speaking, it is no longer a conventional QP problem in the presence of the non-polyhedral constraints. Slightly abusing the notation, here we use QP for convenience.
The code is available at https://doi.org/10.5281/zenodo.5543733.
References
Aıt-Sahalia, Y., Duarte, J.: Nonparametric option pricing under shape restrictions. J. Econom. 116(1–2), 9–47 (2003)
Allon, G., Beenstock, M., Hackman, S., Passy, U., Shapiro, A.: Nonparametric estimation of concave production technologies by entropic methods. J. Appl. Econ. 22(4), 795–816 (2007)
Aybat, N.S., Wang, Z.: In: A parallel method for large scale convex regression problems. In: 53rd IEEE Conference on Decision and Control, pp. 5710–5717. IEEE (2014)
Balázs, G., György, A., Szepesvári, C.: Near-optimal max-affine estimators for convex regression. In: AISTATS (2015)
Beck, A., Teboulle, M.: Smoothing and first order methods: a unified framework. SIAM J. Optim. 22(2), 557–580 (2012)
Bertsimas, D., Mundru, N.: Sparse convex regression. INFORMS J. Comput. 33(1), 262–279 (2021)
Bertsimas, D., Tsitsiklis, J.N.: Introduction to Linear Optimization, vol. 6. Athena Scientific, Belmont, MA (1997)
Blanchet, J., Glynn, P.W., Yan, J., Zhou, Z.: Multivariate distributionally robust convex regression under absolute error loss. In: Advances in Neural Information Processing Systems, pp. 11817–11826 (2019)
Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155(1–2), 57–79 (2016)
Chen, H., Yao, D.D.: Fundamentals of Queueing Networks: Performance, Asymptotics, and Optimization, vol. 46. Springer, Berlin (2013)
Chen, L., Sun, D.F., Toh, K.C.: An efficient inexact symmetric Gauss–Seidel based majorized ADMM for high-dimensional convex composite conic programming. Math. Program. 161(1–2), 237–270 (2017)
Chen, W., Mazumder, R.: Multivariate convex regression at scale. ar**v preprint ar**v:2005.11588 (2020)
Chen, X., Sun, D.F., Sun, J.: Complementarity functions and numerical experiments on some smoothing Newton methods for second-order-cone complementarity problems. Comput. Optim. Appl. 25(1–3), 39–56 (2003)
Cui, Y., Pang, J.S., Sen, B.: Composite difference-max programs for modern statistical estimation problems. SIAM J. Optim. 28(4), 3344–3374 (2018)
Dontchev, A.L., Qi, H., Qi, L.: Quadratic convergence of Newton’s method for convex interpolation and smoothing. Constr. Approx. 19(1) (2003)
Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer, Berlin (2007)
Han, J., Sun, D.F.: Newton and quasi-Newton methods for normal maps with polyhedral sets. J. Optim. Theory Appl. 94(3), 659–676 (1997)
Hannah, L.A., Dunson, D.B.: Multivariate convex regression with adaptive partitioning. J. Mach. Learn. Res. 14(1), 3261–3294 (2013)
Hanoch, G., Rothschild, M.: Testing the assumptions of production theory: a nonparametric approach. J. Polit. Econ. 80(2), 256–275 (1972)
Hanson, D., Pledger, G.: Consistency in concave regression. Ann. Stat. pp. 1038–1050 (1976)
Hildreth, C.: Point estimates of ordinates of concave functions. J. Am. Stat. Assoc. 49(267), 598–619 (1954)
Kummer, B.: Newton’s method for non-differentiable functions. Adv. Math. Optim. 45, 114–125 (1988)
Kuosmanen, T.: Representation theorem for convex nonparametric least squares. Economet. J. 11(2), 308–325 (2008)
Li, X., Sun, D.F., Toh, K.C.: On efficiently solving the subproblems of a level-set method for fused lasso problems. SIAM J. Optim. 28(2), 1842–1866 (2018)
Li, X., Sun, D.F., Toh, K.C.: An asymptotically superlinearly convergent semismooth Newton augmented Lagrangian method for linear programming. SIAM J. Optim. 30(3), 2410–2440 (2020)
Li, X., Sun, D.F., Toh, K.C.: On the efficient computation of a generalized Jacobian of the projector over the Birkhoff polytope. Math. Program. 179(1–2), 419–446 (2020)
Lim, E.: On convergence rates of convex regression in multiple dimensions. INFORMS J. Comput. 26(3), 616–628 (2014)
Lim, E., Glynn, P.W.: Consistency of multidimensional convex regression. Oper. Res. 60(1), 196–208 (2012)
Longstaff, F.A., Schwartz, E.S.: Valuing American options by simulation: a simple least-squares approach. Rev. Financ. Stud. 14(1), 113–147 (2001)
Mazumder, R., Choudhury, A., Iyengar, G., Sen, B.: A computational framework for multivariate convex regression and its variants. J. Am. Stat. Assoc. 114(525), 318–331 (2019)
Meyer, R.F., Pratt, J.W.: The consistent assessment and fairing of preference functions. IEEE Trans. Syst. Sci. Cybern. 4(3), 270–278 (1968)
Mifflin, R.: Semismooth and semiconvex functions in constrained optimization. SIAM J. Control. Optim. 15(6), 959–972 (1977)
Moreau, J.J.: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 93, 273–299 (1965)
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Nocedal, J., Wright, S.: Numerical Optimization. Springer, Berlin (2006)
Qi, H., Yang, X.: Regularity and well-posedness of a dual program for convex best C\(^1\)-spline interpolation. Comput. Optim. Appl. 37(3), 409–425 (2007)
Qi, L., Sun, J.: A nonsmooth version of Newton’s method. Math. Progr. 58(1–3), 353–367 (1993)
Ramsey, F., Schafer, D.: The Statistical Sleuth: A Course in Methods of Data Analysis. Cengage Learning, Boston (2012)
Robinson, S.M.: Some continuity properties of polyhedral multifunctions. In: Mathematical Programming at Oberwolfach, pp. 206–214. Springer (1981)
Rockafellar, R.T.: Convex Analysis, vol. 28. Princeton University Press, Princeton (1970)
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control. Optim. 14(5), 877–898 (1976)
Seijo, E., Sen, B.: Nonparametric least squares estimation of a multivariate convex regression function. Ann. Stat. 39(3), 1633–1657 (2011)
Sun, D.F., Sun, J.: Semismooth matrix-valued functions. Math. Oper. Res. 27(1), 150–169 (2002)
Varian, H.R.: The nonparametric approach to demand analysis. Economet. J. Econ. Soc. 50, 945–973 (1982)
Varian, H.R.: The nonparametric approach to production analysis. Econom. J. Econom. Soc. 52, 579–597 (1984)
Verbeek, M.: A Guide to Modern Econometrics. Wiley, Hoboken (2008)
Yagi, D., Chen, Y., Johnson, A.L., Kuosmanen, T.: Shape-constrained kernel-weighted least squares: estimating production functions for Chilean manufacturing industries. J. Bus. Econ. Stat. 38, 1–12 (2018)
Zhao, X.Y., Sun, D.F., Toh, K.C.: A Newton-CG augmented Lagrangian method for semidefinite programming. SIAM J. Optim. 20(4), 1737–1765 (2010)
Acknowledgements
The authors would like to thank Professor Necdet S. Aybat for helpful clarifications on his work in [3].
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Defeng Sun is supported in part by Hong Kong Research Grant Council under grant number 15304019 and Kim-Chuan Toh by the Ministry of Education, Singapore, under its Academic Research Fund Tier 3 grant call (MOE-2019-T3-1-010).
Appendices
Appendices
Derivation of the proximal map** and generalized Jacobian associated with \(\mathcal {D}=\{x\in {\mathbb {R}}^d\mid \Vert x\Vert _{1}\le L\}\)
For \(x\in {\mathbb {R}}^d\), let \(P_x=\mathrm{Diag}(\mathrm{sign}(x))\in {\mathbb {R}}^{d\times d}\), then
where \(\varDelta _d=\{x\in {\mathbb {R}}^d\mid e_d^T x=1,x\ge 0\}\). To derive the generalized Jacobian of \(\varPi _\mathcal {D}(\cdot )\), we need the generalized Jacobian of \(\varPi _{\varDelta _d}(\cdot )\). Following the idea in [17, 26], we can explicitly compute an element of the generalized Jacobian of \(\varPi _{\varDelta _d}(\cdot )\) at \(P_x x/L\). Let K be the set of index i such that \((\varPi _{\varDelta }(P_x x/L))_i=0\). Then
is an element in \(\partial \varPi _{\varDelta _d}(P_x x/L)\), where \(I_K\) means the matrix consisting of the rows of the identity matrix \(I_d\), indexed by K. After some algebraic computation, we can see
where \(r\in {\mathbb {R}}^d\) is defined as \(r_i=1\) if \((\varPi _{\varDelta }(P_x x/L))_i\ne 0 \) and \(r_i=0\) otherwise. Therefore,
A symmetric Gauss–Seidel based alternating direction method of multipliers (sGS-ADMM) for (P)
In the literature, popular first-order methods based on the framework of the alternating direction method of multipliers have been applied to solve (P). In [30, Section A.2], the problem (P) is reformulated as
The corresponding augmented Lagrangian function for a fixed \(\sigma >0\) is defined by
Then the two-block ADMM is given as
where \(\tau \in (0,(1+\sqrt{5}/2))\) is a given step length. As described in [30], the subproblem of updating \(\xi \) is separable in the variables \(\xi _i\)’s for \(i=1,\ldots ,n\), and the update of each \(\xi _i\) can be solved by using an interior point method. The update of \(\theta \) and \(\eta \) is performed by using a block coordinate descent method, which may converge slowly. One can also apply the directly extended three-block ADMM algorithm as in [30, Section 2.1] to solve (P), and the steps are given by
In the directly extended three-block ADMM, the subproblem of updating \(\theta \) can be computed by solving a linear system, and that of updating \(\eta \) can be solved by the projection onto \({\mathbb {R}}_{+}^{n^2}\). However, it is shown in [9] that the directly extended three-block ADMM may not be convergent. Thus it is desirable to employ an algorithm that is guaranteed to converge.
In this section, we aim to present an efficient and convergent multi-block ADMM for solving (P). The authors in [11] have proposed an inexact symmetric Gauss–Seidel based multi-block ADMM for solving high-dimensional convex composite conic optimization problems, and it was demonstrated to perform better than the possibly nonconvergent directly extended multi-block ADMM. To adapt the sGS-ADMM in [11] to solve (P), we first rewrite (P) as follows:
Given a parameter \(\sigma >0\), the augmented Lagrangian function associated with (24) is defined by
Then the sGS-ADMM algorithm for solving (P) is given as in Algorithm 4.
![figure d](http://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs12532-021-00210-0/MediaObjects/12532_2021_210_Figd_HTML.png)
In Algorithm 4, all the subproblems can be solved explicitly. In Step 1, \(\eta ^{k+1}\) and \(y^{k+1}\) are separable and can be solved independently as
where \(\varPi _{\pm }(\cdot )\) denotes the projection onto \({\mathbb {R}}^{n^2}_{\pm }\). In Step 2a and Step 2c, \(\theta \) can be computed by solving the following linear system
By noting that \(A^TA=2nI_n-2e_ne_n^T\), one can apply the Sherman-Morrison-Woodbury formula to compute
Thus \(\theta \) can be computed in O(n) operations. For Step 2b, \(\xi ^{k+1}\) can be computed by solving the linear equation
As the coefficient matrix \(I_{dn}+B^TB\) is a block diagonal matrix consisting of n blocks of \(d\times d\) submatrices, each \(\xi _i\) can be computed separately, and the inverse of each block only needs to be computed once.
The convergence result of Algorithm 4 is presented in the following theorem, which is taken directly from [11, Theorem 5.1].
Theorem 5
Suppose that the solution set to the KKT system (10) is nonempty. Let \(\{(\theta ^k,\xi ^k,y^k,\eta ^k,u^k,v^k)\}\) be the sequence generated by Algorithm 4. Then \(\{(\theta ^k,\xi ^k,y^k,\eta ^k)\}\) converges to an optimal solution of problem (24), and \(\{(u^k,v^k)\}\) converges to an optimal solution of its dual (D).
More results on comparison of algorithms for solving (P)
Tables 7, 8, 9, 10, 11 and 12 show the comparison among proxALM, sGS-ADMM and MOSEK on instances with relatively large d and n. Note that here we set the stop** criterion to \(R_{\mathrm{KKT}}\le 10^{-6}\) to show that our proposed proxALM is capable of solving the problem (P) to relatively high accuracy. As one can see that, when estimating the function \(\psi (x)=\exp (p^T x)\) for moderate \((d,n)=(100,1000)\), proxALM is about 3 times faster than sGS-ADMM, and about 29 times faster than MOSEK. For the case when \(d=100\), \(n=4000\), which is a large problem with 404, 000 variables and about 16, 000, 000 inequality constraints, MOSEK runs out of memory, while proxALM could solve it within 7 minutes and sGS-ADMM takes 17 minutes. From the tables, we can see that sGS-ADMM performs much better than MOSEK in each instance, and proxALM performs even better than sGS-ADMM. In most of the cases, proxALM is at least 10 times faster than MOSEK.
Property of basket option of two European call options
The function V(x, y) is differentiable since it is the solution of the Black-Scholes PDE. By the definition of V, we can see that V is non-decreasing in x and y, which means that \(\nabla V(x,y)\ge 0\). According to the distribution of \(S_T^1\) and \(S_T^2\), we have that
where
For any \(x_1,x_2,y\in {\mathbb {R}}\), we can see that
Similarly, we can prove that for any \(x,y_1,y_2\in {\mathbb {R}}\),
Therefore, we have that fact that \(0\le \nabla V(x,y)\le w\) for any x, y.
A finite difference method for estimating the basket option of two European call options
It is well-known that the function \(V(x,y)=U(0,x,y)\), where U satisfies the Black-Scholes PDE
Let \(\tau = T-t\), \(u(\tau ,x,y)=U(t,x,y)\), then u satisfies
The above convection-diffusion equation can be solved numerically on a bounded region \((0,x_{\max })\times (0,y_{\max })\) by the standard finite difference method with the artificial boundary conditions
where
and \(\varPhi (\cdot )\) is the cumulative distribution function of the standard normal distribution.
Rights and permissions
About this article
Cite this article
Lin, M., Sun, D. & Toh, KC. An augmented Lagrangian method with constraint generation for shape-constrained convex regression problems. Math. Prog. Comp. 14, 223–270 (2022). https://doi.org/10.1007/s12532-021-00210-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12532-021-00210-0
Keywords
- Shape-constrainted convex regression
- Preconditioned proximal point algorithm
- Semismooth Newton method
- Constraint generation method