An augmented Lagrangian method with constraint generation for shape-constrained convex regression problems

Lin, Meixia; Sun, Defeng; Toh, Kim-Chuan

doi:10.1007/s12532-021-00210-0

An augmented Lagrangian method with constraint generation for shape-constrained convex regression problems

Full Length Paper
Published: 08 November 2021

Volume 14, pages 223–270, (2022)
Cite this article

Mathematical Programming Computation Aims and scope Submit manuscript

Meixia Lin¹,
Defeng Sun² &
Kim-Chuan Toh³

755 Accesses
4 Citations
Explore all metrics

Abstract

Shape-constrained convex regression problem deals with fitting a convex function to the observed data, where additional constraints are imposed, such as component-wise monotonicity and uniform Lipschitz continuity. This paper provides a unified framework for computing the least squares estimator of a multivariate shape-constrained convex regression function in ${\mathbb {R}}^d$. We prove that the least squares estimator is computable via solving an essentially constrained convex quadratic programming (QP) problem with $(d+1)n$ variables, $n(n-1)$ linear inequality constraints and n possibly non-polyhedral inequality constraints, where n is the number of data points. To efficiently solve the generally very large-scale convex QP, we design a proximal augmented Lagrangian method (proxALM) whose subproblems are solved by the semismooth Newton method. To further accelerate the computation when n is huge, we design a practical implementation of the constraint generation method such that each reduced problem is efficiently solved by our proposed proxALM. Comprehensive numerical experiments, including those in the pricing of basket options and estimation of production functions in economics, demonstrate that our proposed proxALM outperforms the state-of-the-art algorithms, and the proposed acceleration technique further shortens the computation time by a large margin.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

QPALM: a proximal augmented lagrangian method for nonconvex quadratic programs

Article 22 March 2022

Towards an efficient augmented Lagrangian method for convex quadratic programming

Article 16 December 2019

A dynamic programming approach for generalized nearly isotonic optimization

Article 19 October 2022

Data availability statement

This manuscript has no associated data or the data will not be deposited. The DOI of our code is https://doi.org/10.5281/zenodo.5543733 . In our code, we test our algorithms on synthetic datasets. For the real datasets tested in the paper, we provide the links in Readme.txt in our code.

Notes

Strictly speaking, it is no longer a conventional QP problem in the presence of the non-polyhedral constraints. Slightly abusing the notation, here we use QP for convenience.
The code is available at https://doi.org/10.5281/zenodo.5543733.
https://www.wiley.com/legacy/wileychi/verbeek2ed/datasets.html.

References

Aıt-Sahalia, Y., Duarte, J.: Nonparametric option pricing under shape restrictions. J. Econom. 116(1–2), 9–47 (2003)
Article MathSciNet Google Scholar
Allon, G., Beenstock, M., Hackman, S., Passy, U., Shapiro, A.: Nonparametric estimation of concave production technologies by entropic methods. J. Appl. Econ. 22(4), 795–816 (2007)
Article MathSciNet Google Scholar
Aybat, N.S., Wang, Z.: In: A parallel method for large scale convex regression problems. In: 53rd IEEE Conference on Decision and Control, pp. 5710–5717. IEEE (2014)
Balázs, G., György, A., Szepesvári, C.: Near-optimal max-affine estimators for convex regression. In: AISTATS (2015)
Beck, A., Teboulle, M.: Smoothing and first order methods: a unified framework. SIAM J. Optim. 22(2), 557–580 (2012)
Article MathSciNet Google Scholar
Bertsimas, D., Mundru, N.: Sparse convex regression. INFORMS J. Comput. 33(1), 262–279 (2021)
Article MathSciNet Google Scholar
Bertsimas, D., Tsitsiklis, J.N.: Introduction to Linear Optimization, vol. 6. Athena Scientific, Belmont, MA (1997)
Google Scholar
Blanchet, J., Glynn, P.W., Yan, J., Zhou, Z.: Multivariate distributionally robust convex regression under absolute error loss. In: Advances in Neural Information Processing Systems, pp. 11817–11826 (2019)
Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155(1–2), 57–79 (2016)
Article MathSciNet Google Scholar
Chen, H., Yao, D.D.: Fundamentals of Queueing Networks: Performance, Asymptotics, and Optimization, vol. 46. Springer, Berlin (2013)
Google Scholar
Chen, L., Sun, D.F., Toh, K.C.: An efficient inexact symmetric Gauss–Seidel based majorized ADMM for high-dimensional convex composite conic programming. Math. Program. 161(1–2), 237–270 (2017)
Article MathSciNet Google Scholar
Chen, W., Mazumder, R.: Multivariate convex regression at scale. ar**v preprint ar**v:2005.11588 (2020)
Chen, X., Sun, D.F., Sun, J.: Complementarity functions and numerical experiments on some smoothing Newton methods for second-order-cone complementarity problems. Comput. Optim. Appl. 25(1–3), 39–56 (2003)
Article MathSciNet Google Scholar
Cui, Y., Pang, J.S., Sen, B.: Composite difference-max programs for modern statistical estimation problems. SIAM J. Optim. 28(4), 3344–3374 (2018)
Article MathSciNet Google Scholar
Dontchev, A.L., Qi, H., Qi, L.: Quadratic convergence of Newton’s method for convex interpolation and smoothing. Constr. Approx. 19(1) (2003)
Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer, Berlin (2007)
MATH Google Scholar
Han, J., Sun, D.F.: Newton and quasi-Newton methods for normal maps with polyhedral sets. J. Optim. Theory Appl. 94(3), 659–676 (1997)
Article MathSciNet Google Scholar
Hannah, L.A., Dunson, D.B.: Multivariate convex regression with adaptive partitioning. J. Mach. Learn. Res. 14(1), 3261–3294 (2013)
MathSciNet MATH Google Scholar
Hanoch, G., Rothschild, M.: Testing the assumptions of production theory: a nonparametric approach. J. Polit. Econ. 80(2), 256–275 (1972)
Article Google Scholar
Hanson, D., Pledger, G.: Consistency in concave regression. Ann. Stat. pp. 1038–1050 (1976)
Hildreth, C.: Point estimates of ordinates of concave functions. J. Am. Stat. Assoc. 49(267), 598–619 (1954)
Article MathSciNet Google Scholar
Kummer, B.: Newton’s method for non-differentiable functions. Adv. Math. Optim. 45, 114–125 (1988)
Kuosmanen, T.: Representation theorem for convex nonparametric least squares. Economet. J. 11(2), 308–325 (2008)
Article Google Scholar
Li, X., Sun, D.F., Toh, K.C.: On efficiently solving the subproblems of a level-set method for fused lasso problems. SIAM J. Optim. 28(2), 1842–1866 (2018)
Article MathSciNet Google Scholar
Li, X., Sun, D.F., Toh, K.C.: An asymptotically superlinearly convergent semismooth Newton augmented Lagrangian method for linear programming. SIAM J. Optim. 30(3), 2410–2440 (2020)
Article MathSciNet Google Scholar
Li, X., Sun, D.F., Toh, K.C.: On the efficient computation of a generalized Jacobian of the projector over the Birkhoff polytope. Math. Program. 179(1–2), 419–446 (2020)
Article MathSciNet Google Scholar
Lim, E.: On convergence rates of convex regression in multiple dimensions. INFORMS J. Comput. 26(3), 616–628 (2014)
Article MathSciNet Google Scholar
Lim, E., Glynn, P.W.: Consistency of multidimensional convex regression. Oper. Res. 60(1), 196–208 (2012)
Article MathSciNet Google Scholar
Longstaff, F.A., Schwartz, E.S.: Valuing American options by simulation: a simple least-squares approach. Rev. Financ. Stud. 14(1), 113–147 (2001)
Article Google Scholar
Mazumder, R., Choudhury, A., Iyengar, G., Sen, B.: A computational framework for multivariate convex regression and its variants. J. Am. Stat. Assoc. 114(525), 318–331 (2019)
Article MathSciNet Google Scholar
Meyer, R.F., Pratt, J.W.: The consistent assessment and fairing of preference functions. IEEE Trans. Syst. Sci. Cybern. 4(3), 270–278 (1968)
Article Google Scholar
Mifflin, R.: Semismooth and semiconvex functions in constrained optimization. SIAM J. Control. Optim. 15(6), 959–972 (1977)
Article MathSciNet Google Scholar
Moreau, J.J.: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 93, 273–299 (1965)
Article MathSciNet Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Article MathSciNet Google Scholar
Nocedal, J., Wright, S.: Numerical Optimization. Springer, Berlin (2006)
MATH Google Scholar
Qi, H., Yang, X.: Regularity and well-posedness of a dual program for convex best C$^1$-spline interpolation. Comput. Optim. Appl. 37(3), 409–425 (2007)
Article MathSciNet Google Scholar
Qi, L., Sun, J.: A nonsmooth version of Newton’s method. Math. Progr. 58(1–3), 353–367 (1993)
Ramsey, F., Schafer, D.: The Statistical Sleuth: A Course in Methods of Data Analysis. Cengage Learning, Boston (2012)
MATH Google Scholar
Robinson, S.M.: Some continuity properties of polyhedral multifunctions. In: Mathematical Programming at Oberwolfach, pp. 206–214. Springer (1981)
Rockafellar, R.T.: Convex Analysis, vol. 28. Princeton University Press, Princeton (1970)
Book Google Scholar
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control. Optim. 14(5), 877–898 (1976)
Article MathSciNet Google Scholar
Seijo, E., Sen, B.: Nonparametric least squares estimation of a multivariate convex regression function. Ann. Stat. 39(3), 1633–1657 (2011)
MathSciNet MATH Google Scholar
Sun, D.F., Sun, J.: Semismooth matrix-valued functions. Math. Oper. Res. 27(1), 150–169 (2002)
Article MathSciNet Google Scholar
Varian, H.R.: The nonparametric approach to demand analysis. Economet. J. Econ. Soc. 50, 945–973 (1982)
MathSciNet MATH Google Scholar
Varian, H.R.: The nonparametric approach to production analysis. Econom. J. Econom. Soc. 52, 579–597 (1984)
MathSciNet MATH Google Scholar
Verbeek, M.: A Guide to Modern Econometrics. Wiley, Hoboken (2008)
MATH Google Scholar
Yagi, D., Chen, Y., Johnson, A.L., Kuosmanen, T.: Shape-constrained kernel-weighted least squares: estimating production functions for Chilean manufacturing industries. J. Bus. Econ. Stat. 38, 1–12 (2018)
MathSciNet Google Scholar
Zhao, X.Y., Sun, D.F., Toh, K.C.: A Newton-CG augmented Lagrangian method for semidefinite programming. SIAM J. Optim. 20(4), 1737–1765 (2010)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to thank Professor Necdet S. Aybat for helpful clarifications on his work in [3].

Author information

Authors and Affiliations

Institute of Operations Research and Analytics, National University of Singapore, Singapore, 119076, Singapore
Meixia Lin
Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
Defeng Sun
Department of Mathematics and Institute of Operations Research and Analytics, National University of Singapore, Singapore, 119076, Singapore
Kim-Chuan Toh

Authors

Meixia Lin
View author publications
You can also search for this author in PubMed Google Scholar
Defeng Sun
View author publications
You can also search for this author in PubMed Google Scholar
Kim-Chuan Toh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meixia Lin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Defeng Sun is supported in part by Hong Kong Research Grant Council under grant number 15304019 and Kim-Chuan Toh by the Ministry of Education, Singapore, under its Academic Research Fund Tier 3 grant call (MOE-2019-T3-1-010).

Appendices

Derivation of the proximal map** and generalized Jacobian associated with $\mathcal {D}=\{x\in {\mathbb {R}}^d\mid \Vert x\Vert _{1}\le L\}$

For $x\in {\mathbb {R}}^d$, let $P_x=\mathrm{Diag}(\mathrm{sign}(x))\in {\mathbb {R}}^{d\times d}$, then

$$\begin{aligned} \varPi _\mathcal {D}(x)&= \underset{y\in {\mathbb {R}}^d}{\arg \min }\ \Big \{\frac{1}{2}\Vert y-x\Vert ^2\mid \Vert y\Vert _1\le L\Big \}\\&= LP_x \Big ( \underset{y\in {\mathbb {R}}^d}{\arg \min }\ \Big \{\frac{1}{2}\Vert y-P_x x/L\Vert ^2\mid e_d^T y \le 1,y\ge 0\Big \}\Big )\\&=\left\{ \begin{aligned}&x&\text{ if }\ \Vert x\Vert _1\le L, \\&L P_x \varPi _{\varDelta _d}(P_x x/L)&\text{ otherwise, } \end{aligned}\right. \end{aligned}$$

where $\varDelta _d=\{x\in {\mathbb {R}}^d\mid e_d^T x=1,x\ge 0\}$. To derive the generalized Jacobian of $\varPi _\mathcal {D}(\cdot )$, we need the generalized Jacobian of $\varPi _{\varDelta _d}(\cdot )$. Following the idea in [17, 26], we can explicitly compute an element of the generalized Jacobian of $\varPi _{\varDelta _d}(\cdot )$ at $P_x x/L$. Let K be the set of index i such that $(\varPi _{\varDelta }(P_x x/L))_i=0$. Then

$$\begin{aligned} {\widetilde{H}}=I_d-\begin{bmatrix} I_K^T&e_d \end{bmatrix}\Big ( \begin{bmatrix} I_K \\ e_d^T \end{bmatrix} \begin{bmatrix} I_K^T&e_d \end{bmatrix} \Big )^{\dagger }\begin{bmatrix} I_K \\ e_d^T \end{bmatrix} \end{aligned}$$

is an element in $\partial \varPi _{\varDelta _d}(P_x x/L)$, where $I_K$ means the matrix consisting of the rows of the identity matrix $I_d$, indexed by K. After some algebraic computation, we can see

$$\begin{aligned} {\widetilde{H}}&=I_d-\begin{bmatrix} I_K^T&e_d \end{bmatrix} \begin{bmatrix} I_{|K|}+\frac{1}{n-|K|}e_{|K|} e_{|K|}^T &{} -\frac{1}{n-|K|}e_{|K|} \\ -\frac{1}{n-|K|}e_{|K|}^T &{} \frac{1}{n-|K|} \end{bmatrix} \begin{bmatrix} I_K \\ e_d^T \end{bmatrix}=\mathrm{Diag}(r)-\frac{1}{\mathrm{nnz}(r)}rr^T, \end{aligned}$$

where $r\in {\mathbb {R}}^d$ is defined as $r_i=1$ if $(\varPi _{\varDelta }(P_x x/L))_i\ne 0 $ and $r_i=0$ otherwise. Therefore,

$$\begin{aligned} H\in \partial \varPi _\mathcal {D}(x),\quad \text{ where } H=\left\{ \begin{aligned}&I_d&\text{ if }\ \Vert x\Vert _1\le L, \\&P_x {\widetilde{H}} P_x&\text{ otherwise }. \end{aligned}\right. \end{aligned}$$

A symmetric Gauss–Seidel based alternating direction method of multipliers (sGS-ADMM) for (P)

In the literature, popular first-order methods based on the framework of the alternating direction method of multipliers have been applied to solve (P). In [30, Section A.2], the problem (P) is reformulated as

$$\begin{aligned} \min _{\theta \in {\mathbb {R}}^n,\xi \in {\mathbb {R}}^{dn},\eta \in {\mathbb {R}}^{n^2}}\ \Big \{\frac{1}{2}\Vert \theta -Y\Vert ^2+p(\xi )+\delta _{+}(\eta )\mid A\theta +B\xi -\eta =0\Big \}. \end{aligned}$$

The corresponding augmented Lagrangian function for a fixed $\sigma >0$ is defined by

$$\begin{aligned} \widetilde{\mathcal {L}}_{\sigma }(\theta ,\xi ,\eta ;u)=\frac{1}{2}\Vert \theta -Y\Vert ^2+p(\xi )+\delta _{+}(\eta )+\frac{\sigma }{2}\Vert A\theta +B\xi -\eta -\frac{u}{\sigma }\Vert ^2-\frac{1}{2\sigma }\Vert u\Vert ^2. \end{aligned}$$

Then the two-block ADMM is given as

$$\begin{aligned} \left\{ \begin{aligned}&\xi ^{k+1} = \arg \min \ \widetilde{\mathcal {L}}_{\sigma }(\theta ^{k},\xi ,\eta ^{k};u^{k})=\arg \min \ \Big \{ p(\xi )+\frac{\sigma }{2}\Vert A\theta ^k+B\xi -\eta ^k-\frac{u^k}{\sigma }\Vert ^2\Big \},\\&(\theta ^{k+1},\eta ^{k+1}) = \arg \min \ \widetilde{\mathcal {L}}_{\sigma }(\theta ,\xi ^{k+1},\eta ;u^{k}),\\&u^{k+1}=u^k-\tau \sigma (A\theta ^{k+1}+B\xi ^{k+1}-\eta ^{k+1}), \end{aligned}\right. \end{aligned}$$

where $\tau \in (0,(1+\sqrt{5}/2))$ is a given step length. As described in [30], the subproblem of updating $\xi $ is separable in the variables $\xi _i$’s for $i=1,\ldots ,n$, and the update of each $\xi _i$ can be solved by using an interior point method. The update of $\theta $ and $\eta $ is performed by using a block coordinate descent method, which may converge slowly. One can also apply the directly extended three-block ADMM algorithm as in [30, Section 2.1] to solve (P), and the steps are given by

$$\begin{aligned} \left\{ \begin{aligned}&\xi ^{k+1} = \arg \min \ \widetilde{\mathcal {L}}_{\sigma }(\theta ^{k},\xi ,\eta ^{k};u^{k}),\\&\theta ^{k+1} = \arg \min \ \widetilde{\mathcal {L}}_{\sigma }(\theta ,\xi ^{k+1},\eta ^k;u^{k}),\\&\eta ^{k+1} = \arg \min \ \widetilde{\mathcal {L}}_{\sigma }(\theta ^{k+1},\xi ^{k+1},\eta ;u^{k}),\\&u^{k+1}=u^k-\tau \sigma (A\theta ^{k+1}+B\xi ^{k+1}-\eta ^{k+1}). \end{aligned}\right. \end{aligned}$$

In the directly extended three-block ADMM, the subproblem of updating $\theta $ can be computed by solving a linear system, and that of updating $\eta $ can be solved by the projection onto ${\mathbb {R}}_{+}^{n^2}$. However, it is shown in [9] that the directly extended three-block ADMM may not be convergent. Thus it is desirable to employ an algorithm that is guaranteed to converge.

In this section, we aim to present an efficient and convergent multi-block ADMM for solving (P). The authors in [11] have proposed an inexact symmetric Gauss–Seidel based multi-block ADMM for solving high-dimensional convex composite conic optimization problems, and it was demonstrated to perform better than the possibly nonconvergent directly extended multi-block ADMM. To adapt the sGS-ADMM in [11] to solve (P), we first rewrite (P) as follows:

$$\begin{aligned} \min _{\theta \in {\mathbb {R}}^n,\xi ,y\in {\mathbb {R}}^{dn},\eta \in {\mathbb {R}}^{n^2}} \ \Big \{\frac{1}{2}\Vert \theta -Y\Vert ^2+p(y)+\delta _{+}(\eta )\Bigm |A\theta +B\xi -\eta =0,\ \xi -y=0\Big \}. \end{aligned}$$

(24)

Given a parameter $\sigma >0$, the augmented Lagrangian function associated with (24) is defined by

$$\begin{aligned} \widehat{\mathcal {L}}_{\sigma }(\theta ,\xi ,y,\eta ;u,v)&=\frac{1}{2}\Vert \theta -Y\Vert ^2+p(y)+\delta _{+}(\eta ) \nonumber \\&\qquad -\langle u,A\theta +B\xi -\eta \rangle -\langle v,\xi -y\rangle \nonumber \\&\qquad +\frac{\sigma }{2}\Vert A\theta +B\xi -\eta \Vert ^2+\frac{\sigma }{2}\Vert \xi -y\Vert ^2\nonumber \\&=\frac{1}{2}\Vert \theta -Y\Vert ^2+p(y)+\delta _{+}(\eta )+\frac{\sigma }{2}\Vert A\theta +B\xi -\eta \nonumber \\&\qquad -\frac{u}{\sigma }\Vert ^2+\frac{\sigma }{2}\Vert \xi -y-\frac{v}{\sigma }\Vert ^2\nonumber \\&\qquad -\frac{1}{2\sigma }\Vert u\Vert ^2-\frac{1}{2\sigma }\Vert v\Vert ^2. \end{aligned}$$

(25)

Then the sGS-ADMM algorithm for solving (P) is given as in Algorithm 4.

In Algorithm 4, all the subproblems can be solved explicitly. In Step 1, $\eta ^{k+1}$ and $y^{k+1}$ are separable and can be solved independently as

$$\begin{aligned} y^{k+1} = \mathrm{Prox}_{p/\sigma }(\xi ^k-v^k/\sigma ),\quad \eta ^{k+1} = \varPi _{+}(A\theta ^k+B\xi ^k-u^k/\sigma ), \end{aligned}$$

where $\varPi _{\pm }(\cdot )$ denotes the projection onto ${\mathbb {R}}^{n^2}_{\pm }$. In Step 2a and Step 2c, $\theta $ can be computed by solving the following linear system

$$\begin{aligned} (I_n+\sigma A^T A)\theta = Y-\sigma A^T (B\xi -\eta -u/\sigma ). \end{aligned}$$

By noting that $A^TA=2nI_n-2e_ne_n^T$, one can apply the Sherman-Morrison-Woodbury formula to compute

$$\begin{aligned} (I_n+\sigma A^T A)^{-1} = \frac{1}{1+2\sigma n}(I_n+2\sigma e_ne_n^T). \end{aligned}$$

Thus $\theta $ can be computed in O(n) operations. For Step 2b, $\xi ^{k+1}$ can be computed by solving the linear equation

$$\begin{aligned} (I_{dn}+B^T B)\xi = y^{k+1}+v^k/\sigma -B^T(A{\widehat{\theta }}^{k+1}-\eta ^{k+1}-u^k/\sigma ). \end{aligned}$$

As the coefficient matrix $I_{dn}+B^TB$ is a block diagonal matrix consisting of n blocks of $d\times d$ submatrices, each $\xi _i$ can be computed separately, and the inverse of each block only needs to be computed once.

The convergence result of Algorithm 4 is presented in the following theorem, which is taken directly from [11, Theorem 5.1].

Theorem 5

Suppose that the solution set to the KKT system (10) is nonempty. Let $\{(\theta ^k,\xi ^k,y^k,\eta ^k,u^k,v^k)\}$ be the sequence generated by Algorithm 4. Then $\{(\theta ^k,\xi ^k,y^k,\eta ^k)\}$ converges to an optimal solution of problem (24), and $\{(u^k,v^k)\}$ converges to an optimal solution of its dual (D).

More results on comparison of algorithms for solving (P)

Tables 7, 8, 9, 10, 11 and 12 show the comparison among proxALM, sGS-ADMM and MOSEK on instances with relatively large d and n. Note that here we set the stop** criterion to $R_{\mathrm{KKT}}\le 10^{-6}$ to show that our proposed proxALM is capable of solving the problem (P) to relatively high accuracy. As one can see that, when estimating the function $\psi (x)=\exp (p^T x)$ for moderate $(d,n)=(100,1000)$, proxALM is about 3 times faster than sGS-ADMM, and about 29 times faster than MOSEK. For the case when $d=100$, $n=4000$, which is a large problem with 404, 000 variables and about 16, 000, 000 inequality constraints, MOSEK runs out of memory, while proxALM could solve it within 7 minutes and sGS-ADMM takes 17 minutes. From the tables, we can see that sGS-ADMM performs much better than MOSEK in each instance, and proxALM performs even better than sGS-ADMM. In most of the cases, proxALM is at least 10 times faster than MOSEK.

Table 7 Convex regression for test function $\psi (x)=\exp (p^T x)$, where p is a given random vector with each coordinate drawn from the standard normal distribution

Full size table

Table 8 Convex regression with monotone constraint (non-decreasing) for the test function $\psi (x)=(e_d^T x)_{+}$

Full size table

Table 9 Convex regression with box constraint ($L=0_d$, $U=e_d$) for the test function $\psi (x)=\ln (1+\exp (e_d^Tx))$

Full size table

Table 10 Convex regression with Lipschitz constraint ($p=1$, $q=\infty $, $L=1$) for the test function $\psi (x)=\sqrt{1+x^Tx}$

Full size table

Table 11 Convex regression with Lipschitz constraint ($p=2$, $q=2$, $L=\lambda _{\mathrm{max}}(Q)$) for the test function $\psi (x)=\sqrt{x^T Qx}$

Full size table

Table 12 Convex regression with Lipschitz constraint ($p=\infty $, $q=1$, $L=1$) for the test function $\psi (x)=\ln (1+e^{x_1}+\cdots +e^{x_d})$

Full size table

Property of basket option of two European call options

The function V(x, y) is differentiable since it is the solution of the Black-Scholes PDE. By the definition of V, we can see that V is non-decreasing in x and y, which means that $\nabla V(x,y)\ge 0$. According to the distribution of $S_T^1$ and $S_T^2$, we have that

$$\begin{aligned} V(x,y) = e^{-r(T-t)} {\mathbb {E}}_z f(x,y,z), \end{aligned}$$

where

$$\begin{aligned} f(x,y,z)&=(xw_1 e^{(r-\sigma _1^2/2)(T-t)+\sqrt{T-t}z_1}+yw_2 e^{(r-\sigma _2^2/2)(T-t)+\sqrt{T-t}z_2}-K)_+,\\ \begin{pmatrix} z_1\\ z_2 \end{pmatrix}&\sim \mathcal {N}(0,\begin{pmatrix} \sigma _1^2 &{} \rho \sigma _1\sigma _2\\ \rho \sigma _1\sigma _2 &{}\sigma _2^2 \end{pmatrix}). \end{aligned}$$

For any $x_1,x_2,y\in {\mathbb {R}}$, we can see that

$$\begin{aligned} |V(x_1,y)-V(x_2,y)|&=e^{-r(T-t)} \Big | {\mathbb {E}}_z [f(x_1,y,z)-f(x_1,y,z)]\Big |\\&\le e^{-r(T-t)} {\mathbb {E}}_z |f(x_1,y,z)-f(x_1,y,z)|\\&\le e^{-r(T-t)} {\mathbb {E}}_z [w_1e^{(r-\sigma _1^2/2)(T-t)+\sqrt{T-t}z_1} |x_1-x_2|]\\&=w_1|x_1-x_2|e^{-\sigma _1^2/2(T-t)}{\mathbb {E}}_z [e^{\sqrt{T-t}z_1}]\\&=w_1|x_1-x_2|. \end{aligned}$$

Similarly, we can prove that for any $x,y_1,y_2\in {\mathbb {R}}$,

$$\begin{aligned} |V(x,y_1)-V(x,y_2)|\le w_2|y_1-y_2|. \end{aligned}$$

Therefore, we have that fact that $0\le \nabla V(x,y)\le w$ for any x, y.

A finite difference method for estimating the basket option of two European call options

It is well-known that the function $V(x,y)=U(0,x,y)$, where U satisfies the Black-Scholes PDE

$$\begin{aligned} \left\{ \begin{aligned}&\frac{\partial U}{\partial t}+rx\frac{\partial U}{\partial x}+ry\frac{\partial U}{\partial y} +\frac{1}{2}\sigma _1^2x^2\frac{\partial ^2 U}{\partial ^2 x^2}+\rho \sigma _1\sigma _2xy\frac{\partial ^2 U}{\partial xy}+\frac{1}{2}\sigma _2^2y^2\frac{\partial ^2 U}{\partial ^2 y^2}-rU=0,\\&U(T,x,y)=(w_1x+w_2y-K)^+. \end{aligned} \right. \end{aligned}$$

Let $\tau = T-t$, $u(\tau ,x,y)=U(t,x,y)$, then u satisfies

$$\begin{aligned} \left\{ \begin{aligned}&\frac{\partial u}{\partial \tau }-rx\frac{\partial u}{\partial x}-ry\frac{\partial u}{\partial y} -\frac{1}{2}\sigma _1^2x^2\frac{\partial ^2 u}{\partial ^2 x^2}-\rho \sigma _1\sigma _2xy \frac{\partial ^2 u}{\partial xy}-\frac{1}{2}\sigma _2^2y^2\frac{\partial ^2 u}{\partial ^2 y^2}+ru=0,\\&u(0,x,y)=(w_1x+w_2y-K)^+. \end{aligned} \right. \end{aligned}$$

The above convection-diffusion equation can be solved numerically on a bounded region $(0,x_{\max })\times (0,y_{\max })$ by the standard finite difference method with the artificial boundary conditions

$$\begin{aligned} \left\{ \begin{aligned}&u(\tau ,x,0)=c(w_1x,K,r,\tau ,\sigma _1),\\&u(\tau ,0,y)=c(w_2y,K,r,\tau ,\sigma _2),\\&\frac{\partial }{\partial x}u(\tau ,x_{\max },y)=w_1,\\&\frac{\partial }{\partial y}u(\tau ,x,y_{\max })=w_2,\\ \end{aligned} \right. \end{aligned}$$

where

$$\begin{aligned} c(x,K,r,\tau ,\sigma )=x\varPhi (d_1)-Ke^{-r\tau }\varPhi (d_2),\quad d_{1,2} = \frac{\log \frac{x}{K}+(r\pm \frac{1}{2}\sigma ^2)\tau }{\sigma \sqrt{\tau }}, \end{aligned}$$

and $\varPhi (\cdot )$ is the cumulative distribution function of the standard normal distribution.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, M., Sun, D. & Toh, KC. An augmented Lagrangian method with constraint generation for shape-constrained convex regression problems. Math. Prog. Comp. 14, 223–270 (2022). https://doi.org/10.1007/s12532-021-00210-0

Download citation

Received: 08 December 2020
Accepted: 25 September 2021
Published: 08 November 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s12532-021-00210-0

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

An augmented Lagrangian method with constraint generation for shape-constrained convex regression problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

QPALM: a proximal augmented lagrangian method for nonconvex quadratic programs

Towards an efficient augmented Lagrangian method for convex quadratic programming

A dynamic programming approach for generalized nearly isotonic optimization

Data availability statement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendices

Derivation of the proximal map** and generalized Jacobian associated with \(\mathcal {D}=\{x\in {\mathbb {R}}^d\mid \Vert x\Vert _{1}\le L\}\)

A symmetric Gauss–Seidel based alternating direction method of multipliers (sGS-ADMM) for (P)

Theorem 5

More results on comparison of algorithms for solving (P)

Property of basket option of two European call options

A finite difference method for estimating the basket option of two European call options

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

An augmented Lagrangian method with constraint generation for shape-constrained convex regression problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

QPALM: a proximal augmented lagrangian method for nonconvex quadratic programs

Towards an efficient augmented Lagrangian method for convex quadratic programming

A dynamic programming approach for generalized nearly isotonic optimization

Data availability statement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendices

Derivation of the proximal map** and generalized Jacobian associated with \(\mathcal {D}=\{x\in {\mathbb {R}}^d\mid \Vert x\Vert _{1}\le L\}\)

A symmetric Gauss–Seidel based alternating direction method of multipliers (sGS-ADMM) for (P)

Theorem 5

More results on comparison of algorithms for solving (P)

Property of basket option of two European call options

A finite difference method for estimating the basket option of two European call options

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation