Log in

An augmented Lagrangian method with constraint generation for shape-constrained convex regression problems

  • Full Length Paper
  • Published:
Mathematical Programming Computation Aims and scope Submit manuscript

Abstract

Shape-constrained convex regression problem deals with fitting a convex function to the observed data, where additional constraints are imposed, such as component-wise monotonicity and uniform Lipschitz continuity. This paper provides a unified framework for computing the least squares estimator of a multivariate shape-constrained convex regression function in \({\mathbb {R}}^d\). We prove that the least squares estimator is computable via solving an essentially constrained convex quadratic programming (QP) problem with \((d+1)n\) variables, \(n(n-1)\) linear inequality constraints and n possibly non-polyhedral inequality constraints, where n is the number of data points. To efficiently solve the generally very large-scale convex QP, we design a proximal augmented Lagrangian method (proxALM) whose subproblems are solved by the semismooth Newton method. To further accelerate the computation when n is huge, we design a practical implementation of the constraint generation method such that each reduced problem is efficiently solved by our proposed proxALM. Comprehensive numerical experiments, including those in the pricing of basket options and estimation of production functions in economics, demonstrate that our proposed proxALM outperforms the state-of-the-art algorithms, and the proposed acceleration technique further shortens the computation time by a large margin.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability statement

This manuscript has no associated data or the data will not be deposited. The DOI of our code is https://doi.org/10.5281/zenodo.5543733 . In our code, we test our algorithms on synthetic datasets. For the real datasets tested in the paper, we provide the links in Readme.txt in our code.

Notes

  1. Strictly speaking, it is no longer a conventional QP problem in the presence of the non-polyhedral constraints. Slightly abusing the notation, here we use QP for convenience.

  2. The code is available at https://doi.org/10.5281/zenodo.5543733.

  3. https://www.wiley.com/legacy/wileychi/verbeek2ed/datasets.html.

References

  1. Aıt-Sahalia, Y., Duarte, J.: Nonparametric option pricing under shape restrictions. J. Econom. 116(1–2), 9–47 (2003)

    Article  MathSciNet  Google Scholar 

  2. Allon, G., Beenstock, M., Hackman, S., Passy, U., Shapiro, A.: Nonparametric estimation of concave production technologies by entropic methods. J. Appl. Econ. 22(4), 795–816 (2007)

    Article  MathSciNet  Google Scholar 

  3. Aybat, N.S., Wang, Z.: In: A parallel method for large scale convex regression problems. In: 53rd IEEE Conference on Decision and Control, pp. 5710–5717. IEEE (2014)

  4. Balázs, G., György, A., Szepesvári, C.: Near-optimal max-affine estimators for convex regression. In: AISTATS (2015)

  5. Beck, A., Teboulle, M.: Smoothing and first order methods: a unified framework. SIAM J. Optim. 22(2), 557–580 (2012)

    Article  MathSciNet  Google Scholar 

  6. Bertsimas, D., Mundru, N.: Sparse convex regression. INFORMS J. Comput. 33(1), 262–279 (2021)

    Article  MathSciNet  Google Scholar 

  7. Bertsimas, D., Tsitsiklis, J.N.: Introduction to Linear Optimization, vol. 6. Athena Scientific, Belmont, MA (1997)

    Google Scholar 

  8. Blanchet, J., Glynn, P.W., Yan, J., Zhou, Z.: Multivariate distributionally robust convex regression under absolute error loss. In: Advances in Neural Information Processing Systems, pp. 11817–11826 (2019)

  9. Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155(1–2), 57–79 (2016)

    Article  MathSciNet  Google Scholar 

  10. Chen, H., Yao, D.D.: Fundamentals of Queueing Networks: Performance, Asymptotics, and Optimization, vol. 46. Springer, Berlin (2013)

    Google Scholar 

  11. Chen, L., Sun, D.F., Toh, K.C.: An efficient inexact symmetric Gauss–Seidel based majorized ADMM for high-dimensional convex composite conic programming. Math. Program. 161(1–2), 237–270 (2017)

    Article  MathSciNet  Google Scholar 

  12. Chen, W., Mazumder, R.: Multivariate convex regression at scale. ar**v preprint ar**v:2005.11588 (2020)

  13. Chen, X., Sun, D.F., Sun, J.: Complementarity functions and numerical experiments on some smoothing Newton methods for second-order-cone complementarity problems. Comput. Optim. Appl. 25(1–3), 39–56 (2003)

    Article  MathSciNet  Google Scholar 

  14. Cui, Y., Pang, J.S., Sen, B.: Composite difference-max programs for modern statistical estimation problems. SIAM J. Optim. 28(4), 3344–3374 (2018)

    Article  MathSciNet  Google Scholar 

  15. Dontchev, A.L., Qi, H., Qi, L.: Quadratic convergence of Newton’s method for convex interpolation and smoothing. Constr. Approx. 19(1) (2003)

  16. Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer, Berlin (2007)

    MATH  Google Scholar 

  17. Han, J., Sun, D.F.: Newton and quasi-Newton methods for normal maps with polyhedral sets. J. Optim. Theory Appl. 94(3), 659–676 (1997)

    Article  MathSciNet  Google Scholar 

  18. Hannah, L.A., Dunson, D.B.: Multivariate convex regression with adaptive partitioning. J. Mach. Learn. Res. 14(1), 3261–3294 (2013)

    MathSciNet  MATH  Google Scholar 

  19. Hanoch, G., Rothschild, M.: Testing the assumptions of production theory: a nonparametric approach. J. Polit. Econ. 80(2), 256–275 (1972)

    Article  Google Scholar 

  20. Hanson, D., Pledger, G.: Consistency in concave regression. Ann. Stat. pp. 1038–1050 (1976)

  21. Hildreth, C.: Point estimates of ordinates of concave functions. J. Am. Stat. Assoc. 49(267), 598–619 (1954)

    Article  MathSciNet  Google Scholar 

  22. Kummer, B.: Newton’s method for non-differentiable functions. Adv. Math. Optim. 45, 114–125 (1988)

  23. Kuosmanen, T.: Representation theorem for convex nonparametric least squares. Economet. J. 11(2), 308–325 (2008)

    Article  Google Scholar 

  24. Li, X., Sun, D.F., Toh, K.C.: On efficiently solving the subproblems of a level-set method for fused lasso problems. SIAM J. Optim. 28(2), 1842–1866 (2018)

    Article  MathSciNet  Google Scholar 

  25. Li, X., Sun, D.F., Toh, K.C.: An asymptotically superlinearly convergent semismooth Newton augmented Lagrangian method for linear programming. SIAM J. Optim. 30(3), 2410–2440 (2020)

    Article  MathSciNet  Google Scholar 

  26. Li, X., Sun, D.F., Toh, K.C.: On the efficient computation of a generalized Jacobian of the projector over the Birkhoff polytope. Math. Program. 179(1–2), 419–446 (2020)

    Article  MathSciNet  Google Scholar 

  27. Lim, E.: On convergence rates of convex regression in multiple dimensions. INFORMS J. Comput. 26(3), 616–628 (2014)

    Article  MathSciNet  Google Scholar 

  28. Lim, E., Glynn, P.W.: Consistency of multidimensional convex regression. Oper. Res. 60(1), 196–208 (2012)

    Article  MathSciNet  Google Scholar 

  29. Longstaff, F.A., Schwartz, E.S.: Valuing American options by simulation: a simple least-squares approach. Rev. Financ. Stud. 14(1), 113–147 (2001)

    Article  Google Scholar 

  30. Mazumder, R., Choudhury, A., Iyengar, G., Sen, B.: A computational framework for multivariate convex regression and its variants. J. Am. Stat. Assoc. 114(525), 318–331 (2019)

    Article  MathSciNet  Google Scholar 

  31. Meyer, R.F., Pratt, J.W.: The consistent assessment and fairing of preference functions. IEEE Trans. Syst. Sci. Cybern. 4(3), 270–278 (1968)

    Article  Google Scholar 

  32. Mifflin, R.: Semismooth and semiconvex functions in constrained optimization. SIAM J. Control. Optim. 15(6), 959–972 (1977)

    Article  MathSciNet  Google Scholar 

  33. Moreau, J.J.: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 93, 273–299 (1965)

    Article  MathSciNet  Google Scholar 

  34. Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)

    Article  MathSciNet  Google Scholar 

  35. Nocedal, J., Wright, S.: Numerical Optimization. Springer, Berlin (2006)

    MATH  Google Scholar 

  36. Qi, H., Yang, X.: Regularity and well-posedness of a dual program for convex best C\(^1\)-spline interpolation. Comput. Optim. Appl. 37(3), 409–425 (2007)

    Article  MathSciNet  Google Scholar 

  37. Qi, L., Sun, J.: A nonsmooth version of Newton’s method. Math. Progr. 58(1–3), 353–367 (1993)

  38. Ramsey, F., Schafer, D.: The Statistical Sleuth: A Course in Methods of Data Analysis. Cengage Learning, Boston (2012)

    MATH  Google Scholar 

  39. Robinson, S.M.: Some continuity properties of polyhedral multifunctions. In: Mathematical Programming at Oberwolfach, pp. 206–214. Springer (1981)

  40. Rockafellar, R.T.: Convex Analysis, vol. 28. Princeton University Press, Princeton (1970)

    Book  Google Scholar 

  41. Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control. Optim. 14(5), 877–898 (1976)

    Article  MathSciNet  Google Scholar 

  42. Seijo, E., Sen, B.: Nonparametric least squares estimation of a multivariate convex regression function. Ann. Stat. 39(3), 1633–1657 (2011)

    MathSciNet  MATH  Google Scholar 

  43. Sun, D.F., Sun, J.: Semismooth matrix-valued functions. Math. Oper. Res. 27(1), 150–169 (2002)

    Article  MathSciNet  Google Scholar 

  44. Varian, H.R.: The nonparametric approach to demand analysis. Economet. J. Econ. Soc. 50, 945–973 (1982)

    MathSciNet  MATH  Google Scholar 

  45. Varian, H.R.: The nonparametric approach to production analysis. Econom. J. Econom. Soc. 52, 579–597 (1984)

    MathSciNet  MATH  Google Scholar 

  46. Verbeek, M.: A Guide to Modern Econometrics. Wiley, Hoboken (2008)

    MATH  Google Scholar 

  47. Yagi, D., Chen, Y., Johnson, A.L., Kuosmanen, T.: Shape-constrained kernel-weighted least squares: estimating production functions for Chilean manufacturing industries. J. Bus. Econ. Stat. 38, 1–12 (2018)

    MathSciNet  Google Scholar 

  48. Zhao, X.Y., Sun, D.F., Toh, K.C.: A Newton-CG augmented Lagrangian method for semidefinite programming. SIAM J. Optim. 20(4), 1737–1765 (2010)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Professor Necdet S. Aybat for helpful clarifications on his work in [3].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meixia Lin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Defeng Sun is supported in part by Hong Kong Research Grant Council under grant number 15304019 and Kim-Chuan Toh by the Ministry of Education, Singapore, under its Academic Research Fund Tier 3 grant call (MOE-2019-T3-1-010).

Appendices

Appendices

Derivation of the proximal map** and generalized Jacobian associated with \(\mathcal {D}=\{x\in {\mathbb {R}}^d\mid \Vert x\Vert _{1}\le L\}\)

For \(x\in {\mathbb {R}}^d\), let \(P_x=\mathrm{Diag}(\mathrm{sign}(x))\in {\mathbb {R}}^{d\times d}\), then

$$\begin{aligned} \varPi _\mathcal {D}(x)&= \underset{y\in {\mathbb {R}}^d}{\arg \min }\ \Big \{\frac{1}{2}\Vert y-x\Vert ^2\mid \Vert y\Vert _1\le L\Big \}\\&= LP_x \Big ( \underset{y\in {\mathbb {R}}^d}{\arg \min }\ \Big \{\frac{1}{2}\Vert y-P_x x/L\Vert ^2\mid e_d^T y \le 1,y\ge 0\Big \}\Big )\\&=\left\{ \begin{aligned}&x&\text{ if }\ \Vert x\Vert _1\le L, \\&L P_x \varPi _{\varDelta _d}(P_x x/L)&\text{ otherwise, } \end{aligned}\right. \end{aligned}$$

where \(\varDelta _d=\{x\in {\mathbb {R}}^d\mid e_d^T x=1,x\ge 0\}\). To derive the generalized Jacobian of \(\varPi _\mathcal {D}(\cdot )\), we need the generalized Jacobian of \(\varPi _{\varDelta _d}(\cdot )\). Following the idea in [17, 26], we can explicitly compute an element of the generalized Jacobian of \(\varPi _{\varDelta _d}(\cdot )\) at \(P_x x/L\). Let K be the set of index i such that \((\varPi _{\varDelta }(P_x x/L))_i=0\). Then

$$\begin{aligned} {\widetilde{H}}=I_d-\begin{bmatrix} I_K^T&e_d \end{bmatrix}\Big ( \begin{bmatrix} I_K \\ e_d^T \end{bmatrix} \begin{bmatrix} I_K^T&e_d \end{bmatrix} \Big )^{\dagger }\begin{bmatrix} I_K \\ e_d^T \end{bmatrix} \end{aligned}$$

is an element in \(\partial \varPi _{\varDelta _d}(P_x x/L)\), where \(I_K\) means the matrix consisting of the rows of the identity matrix \(I_d\), indexed by K. After some algebraic computation, we can see

$$\begin{aligned} {\widetilde{H}}&=I_d-\begin{bmatrix} I_K^T&e_d \end{bmatrix} \begin{bmatrix} I_{|K|}+\frac{1}{n-|K|}e_{|K|} e_{|K|}^T &{} -\frac{1}{n-|K|}e_{|K|} \\ -\frac{1}{n-|K|}e_{|K|}^T &{} \frac{1}{n-|K|} \end{bmatrix} \begin{bmatrix} I_K \\ e_d^T \end{bmatrix}=\mathrm{Diag}(r)-\frac{1}{\mathrm{nnz}(r)}rr^T, \end{aligned}$$

where \(r\in {\mathbb {R}}^d\) is defined as \(r_i=1\) if \((\varPi _{\varDelta }(P_x x/L))_i\ne 0 \) and \(r_i=0\) otherwise. Therefore,

$$\begin{aligned} H\in \partial \varPi _\mathcal {D}(x),\quad \text{ where } H=\left\{ \begin{aligned}&I_d&\text{ if }\ \Vert x\Vert _1\le L, \\&P_x {\widetilde{H}} P_x&\text{ otherwise }. \end{aligned}\right. \end{aligned}$$

A symmetric Gauss–Seidel based alternating direction method of multipliers (sGS-ADMM) for (P)

In the literature, popular first-order methods based on the framework of the alternating direction method of multipliers have been applied to solve (P). In [30, Section A.2], the problem (P) is reformulated as

$$\begin{aligned} \min _{\theta \in {\mathbb {R}}^n,\xi \in {\mathbb {R}}^{dn},\eta \in {\mathbb {R}}^{n^2}}\ \Big \{\frac{1}{2}\Vert \theta -Y\Vert ^2+p(\xi )+\delta _{+}(\eta )\mid A\theta +B\xi -\eta =0\Big \}. \end{aligned}$$

The corresponding augmented Lagrangian function for a fixed \(\sigma >0\) is defined by

$$\begin{aligned} \widetilde{\mathcal {L}}_{\sigma }(\theta ,\xi ,\eta ;u)=\frac{1}{2}\Vert \theta -Y\Vert ^2+p(\xi )+\delta _{+}(\eta )+\frac{\sigma }{2}\Vert A\theta +B\xi -\eta -\frac{u}{\sigma }\Vert ^2-\frac{1}{2\sigma }\Vert u\Vert ^2. \end{aligned}$$

Then the two-block ADMM is given as

$$\begin{aligned} \left\{ \begin{aligned}&\xi ^{k+1} = \arg \min \ \widetilde{\mathcal {L}}_{\sigma }(\theta ^{k},\xi ,\eta ^{k};u^{k})=\arg \min \ \Big \{ p(\xi )+\frac{\sigma }{2}\Vert A\theta ^k+B\xi -\eta ^k-\frac{u^k}{\sigma }\Vert ^2\Big \},\\&(\theta ^{k+1},\eta ^{k+1}) = \arg \min \ \widetilde{\mathcal {L}}_{\sigma }(\theta ,\xi ^{k+1},\eta ;u^{k}),\\&u^{k+1}=u^k-\tau \sigma (A\theta ^{k+1}+B\xi ^{k+1}-\eta ^{k+1}), \end{aligned}\right. \end{aligned}$$

where \(\tau \in (0,(1+\sqrt{5}/2))\) is a given step length. As described in [30], the subproblem of updating \(\xi \) is separable in the variables \(\xi _i\)’s for \(i=1,\ldots ,n\), and the update of each \(\xi _i\) can be solved by using an interior point method. The update of \(\theta \) and \(\eta \) is performed by using a block coordinate descent method, which may converge slowly. One can also apply the directly extended three-block ADMM algorithm as in [30, Section 2.1] to solve (P), and the steps are given by

$$\begin{aligned} \left\{ \begin{aligned}&\xi ^{k+1} = \arg \min \ \widetilde{\mathcal {L}}_{\sigma }(\theta ^{k},\xi ,\eta ^{k};u^{k}),\\&\theta ^{k+1} = \arg \min \ \widetilde{\mathcal {L}}_{\sigma }(\theta ,\xi ^{k+1},\eta ^k;u^{k}),\\&\eta ^{k+1} = \arg \min \ \widetilde{\mathcal {L}}_{\sigma }(\theta ^{k+1},\xi ^{k+1},\eta ;u^{k}),\\&u^{k+1}=u^k-\tau \sigma (A\theta ^{k+1}+B\xi ^{k+1}-\eta ^{k+1}). \end{aligned}\right. \end{aligned}$$

In the directly extended three-block ADMM, the subproblem of updating \(\theta \) can be computed by solving a linear system, and that of updating \(\eta \) can be solved by the projection onto \({\mathbb {R}}_{+}^{n^2}\). However, it is shown in [9] that the directly extended three-block ADMM may not be convergent. Thus it is desirable to employ an algorithm that is guaranteed to converge.

In this section, we aim to present an efficient and convergent multi-block ADMM for solving (P). The authors in [11] have proposed an inexact symmetric Gauss–Seidel based multi-block ADMM for solving high-dimensional convex composite conic optimization problems, and it was demonstrated to perform better than the possibly nonconvergent directly extended multi-block ADMM. To adapt the sGS-ADMM in [11] to solve (P), we first rewrite (P) as follows:

$$\begin{aligned} \min _{\theta \in {\mathbb {R}}^n,\xi ,y\in {\mathbb {R}}^{dn},\eta \in {\mathbb {R}}^{n^2}} \ \Big \{\frac{1}{2}\Vert \theta -Y\Vert ^2+p(y)+\delta _{+}(\eta )\Bigm |A\theta +B\xi -\eta =0,\ \xi -y=0\Big \}. \end{aligned}$$
(24)

Given a parameter \(\sigma >0\), the augmented Lagrangian function associated with (24) is defined by

$$\begin{aligned} \widehat{\mathcal {L}}_{\sigma }(\theta ,\xi ,y,\eta ;u,v)&=\frac{1}{2}\Vert \theta -Y\Vert ^2+p(y)+\delta _{+}(\eta ) \nonumber \\&\qquad -\langle u,A\theta +B\xi -\eta \rangle -\langle v,\xi -y\rangle \nonumber \\&\qquad +\frac{\sigma }{2}\Vert A\theta +B\xi -\eta \Vert ^2+\frac{\sigma }{2}\Vert \xi -y\Vert ^2\nonumber \\&=\frac{1}{2}\Vert \theta -Y\Vert ^2+p(y)+\delta _{+}(\eta )+\frac{\sigma }{2}\Vert A\theta +B\xi -\eta \nonumber \\&\qquad -\frac{u}{\sigma }\Vert ^2+\frac{\sigma }{2}\Vert \xi -y-\frac{v}{\sigma }\Vert ^2\nonumber \\&\qquad -\frac{1}{2\sigma }\Vert u\Vert ^2-\frac{1}{2\sigma }\Vert v\Vert ^2. \end{aligned}$$
(25)

Then the sGS-ADMM algorithm for solving (P) is given as in Algorithm 4.

figure d

In Algorithm 4, all the subproblems can be solved explicitly. In Step 1, \(\eta ^{k+1}\) and \(y^{k+1}\) are separable and can be solved independently as

$$\begin{aligned} y^{k+1} = \mathrm{Prox}_{p/\sigma }(\xi ^k-v^k/\sigma ),\quad \eta ^{k+1} = \varPi _{+}(A\theta ^k+B\xi ^k-u^k/\sigma ), \end{aligned}$$

where \(\varPi _{\pm }(\cdot )\) denotes the projection onto \({\mathbb {R}}^{n^2}_{\pm }\). In Step 2a and Step 2c, \(\theta \) can be computed by solving the following linear system

$$\begin{aligned} (I_n+\sigma A^T A)\theta = Y-\sigma A^T (B\xi -\eta -u/\sigma ). \end{aligned}$$

By noting that \(A^TA=2nI_n-2e_ne_n^T\), one can apply the Sherman-Morrison-Woodbury formula to compute

$$\begin{aligned} (I_n+\sigma A^T A)^{-1} = \frac{1}{1+2\sigma n}(I_n+2\sigma e_ne_n^T). \end{aligned}$$

Thus \(\theta \) can be computed in O(n) operations. For Step 2b, \(\xi ^{k+1}\) can be computed by solving the linear equation

$$\begin{aligned} (I_{dn}+B^T B)\xi = y^{k+1}+v^k/\sigma -B^T(A{\widehat{\theta }}^{k+1}-\eta ^{k+1}-u^k/\sigma ). \end{aligned}$$

As the coefficient matrix \(I_{dn}+B^TB\) is a block diagonal matrix consisting of n blocks of \(d\times d\) submatrices, each \(\xi _i\) can be computed separately, and the inverse of each block only needs to be computed once.

The convergence result of Algorithm 4 is presented in the following theorem, which is taken directly from [11, Theorem 5.1].

Theorem 5

Suppose that the solution set to the KKT system (10) is nonempty. Let \(\{(\theta ^k,\xi ^k,y^k,\eta ^k,u^k,v^k)\}\) be the sequence generated by Algorithm 4. Then \(\{(\theta ^k,\xi ^k,y^k,\eta ^k)\}\) converges to an optimal solution of problem (24), and \(\{(u^k,v^k)\}\) converges to an optimal solution of its dual (D).

More results on comparison of algorithms for solving (P)

Tables 7, 8, 9, 10, 11 and 12 show the comparison among proxALM, sGS-ADMM and MOSEK on instances with relatively large d and n. Note that here we set the stop** criterion to \(R_{\mathrm{KKT}}\le 10^{-6}\) to show that our proposed proxALM is capable of solving the problem (P) to relatively high accuracy. As one can see that, when estimating the function \(\psi (x)=\exp (p^T x)\) for moderate \((d,n)=(100,1000)\), proxALM is about 3 times faster than sGS-ADMM, and about 29 times faster than MOSEK. For the case when \(d=100\), \(n=4000\), which is a large problem with 404, 000 variables and about 16, 000, 000 inequality constraints, MOSEK runs out of memory, while proxALM could solve it within 7 minutes and sGS-ADMM takes 17 minutes. From the tables, we can see that sGS-ADMM performs much better than MOSEK in each instance, and proxALM performs even better than sGS-ADMM. In most of the cases, proxALM is at least 10 times faster than MOSEK.

Table 7 Convex regression for test function \(\psi (x)=\exp (p^T x)\), where p is a given random vector with each coordinate drawn from the standard normal distribution
Table 8 Convex regression with monotone constraint (non-decreasing) for the test function \(\psi (x)=(e_d^T x)_{+}\)
Table 9 Convex regression with box constraint (\(L=0_d\), \(U=e_d\)) for the test function \(\psi (x)=\ln (1+\exp (e_d^Tx))\)
Table 10 Convex regression with Lipschitz constraint (\(p=1\), \(q=\infty \), \(L=1\)) for the test function \(\psi (x)=\sqrt{1+x^Tx}\)
Table 11 Convex regression with Lipschitz constraint (\(p=2\), \(q=2\), \(L=\lambda _{\mathrm{max}}(Q)\)) for the test function \(\psi (x)=\sqrt{x^T Qx}\)
Table 12 Convex regression with Lipschitz constraint (\(p=\infty \), \(q=1\), \(L=1\)) for the test function \(\psi (x)=\ln (1+e^{x_1}+\cdots +e^{x_d})\)

Property of basket option of two European call options

The function V(xy) is differentiable since it is the solution of the Black-Scholes PDE. By the definition of V, we can see that V is non-decreasing in x and y, which means that \(\nabla V(x,y)\ge 0\). According to the distribution of \(S_T^1\) and \(S_T^2\), we have that

$$\begin{aligned} V(x,y) = e^{-r(T-t)} {\mathbb {E}}_z f(x,y,z), \end{aligned}$$

where

$$\begin{aligned} f(x,y,z)&=(xw_1 e^{(r-\sigma _1^2/2)(T-t)+\sqrt{T-t}z_1}+yw_2 e^{(r-\sigma _2^2/2)(T-t)+\sqrt{T-t}z_2}-K)_+,\\ \begin{pmatrix} z_1\\ z_2 \end{pmatrix}&\sim \mathcal {N}(0,\begin{pmatrix} \sigma _1^2 &{} \rho \sigma _1\sigma _2\\ \rho \sigma _1\sigma _2 &{}\sigma _2^2 \end{pmatrix}). \end{aligned}$$

For any \(x_1,x_2,y\in {\mathbb {R}}\), we can see that

$$\begin{aligned} |V(x_1,y)-V(x_2,y)|&=e^{-r(T-t)} \Big | {\mathbb {E}}_z [f(x_1,y,z)-f(x_1,y,z)]\Big |\\&\le e^{-r(T-t)} {\mathbb {E}}_z |f(x_1,y,z)-f(x_1,y,z)|\\&\le e^{-r(T-t)} {\mathbb {E}}_z [w_1e^{(r-\sigma _1^2/2)(T-t)+\sqrt{T-t}z_1} |x_1-x_2|]\\&=w_1|x_1-x_2|e^{-\sigma _1^2/2(T-t)}{\mathbb {E}}_z [e^{\sqrt{T-t}z_1}]\\&=w_1|x_1-x_2|. \end{aligned}$$

Similarly, we can prove that for any \(x,y_1,y_2\in {\mathbb {R}}\),

$$\begin{aligned} |V(x,y_1)-V(x,y_2)|\le w_2|y_1-y_2|. \end{aligned}$$

Therefore, we have that fact that \(0\le \nabla V(x,y)\le w\) for any xy.

A finite difference method for estimating the basket option of two European call options

It is well-known that the function \(V(x,y)=U(0,x,y)\), where U satisfies the Black-Scholes PDE

$$\begin{aligned} \left\{ \begin{aligned}&\frac{\partial U}{\partial t}+rx\frac{\partial U}{\partial x}+ry\frac{\partial U}{\partial y} +\frac{1}{2}\sigma _1^2x^2\frac{\partial ^2 U}{\partial ^2 x^2}+\rho \sigma _1\sigma _2xy\frac{\partial ^2 U}{\partial xy}+\frac{1}{2}\sigma _2^2y^2\frac{\partial ^2 U}{\partial ^2 y^2}-rU=0,\\&U(T,x,y)=(w_1x+w_2y-K)^+. \end{aligned} \right. \end{aligned}$$

Let \(\tau = T-t\), \(u(\tau ,x,y)=U(t,x,y)\), then u satisfies

$$\begin{aligned} \left\{ \begin{aligned}&\frac{\partial u}{\partial \tau }-rx\frac{\partial u}{\partial x}-ry\frac{\partial u}{\partial y} -\frac{1}{2}\sigma _1^2x^2\frac{\partial ^2 u}{\partial ^2 x^2}-\rho \sigma _1\sigma _2xy \frac{\partial ^2 u}{\partial xy}-\frac{1}{2}\sigma _2^2y^2\frac{\partial ^2 u}{\partial ^2 y^2}+ru=0,\\&u(0,x,y)=(w_1x+w_2y-K)^+. \end{aligned} \right. \end{aligned}$$

The above convection-diffusion equation can be solved numerically on a bounded region \((0,x_{\max })\times (0,y_{\max })\) by the standard finite difference method with the artificial boundary conditions

$$\begin{aligned} \left\{ \begin{aligned}&u(\tau ,x,0)=c(w_1x,K,r,\tau ,\sigma _1),\\&u(\tau ,0,y)=c(w_2y,K,r,\tau ,\sigma _2),\\&\frac{\partial }{\partial x}u(\tau ,x_{\max },y)=w_1,\\&\frac{\partial }{\partial y}u(\tau ,x,y_{\max })=w_2,\\ \end{aligned} \right. \end{aligned}$$

where

$$\begin{aligned} c(x,K,r,\tau ,\sigma )=x\varPhi (d_1)-Ke^{-r\tau }\varPhi (d_2),\quad d_{1,2} = \frac{\log \frac{x}{K}+(r\pm \frac{1}{2}\sigma ^2)\tau }{\sigma \sqrt{\tau }}, \end{aligned}$$

and \(\varPhi (\cdot )\) is the cumulative distribution function of the standard normal distribution.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, M., Sun, D. & Toh, KC. An augmented Lagrangian method with constraint generation for shape-constrained convex regression problems. Math. Prog. Comp. 14, 223–270 (2022). https://doi.org/10.1007/s12532-021-00210-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12532-021-00210-0

Keywords

Mathematics Subject Classification

Navigation