Log in

Computational aspects of column generation for nonlinear and conic optimization: classical and linearized schemes

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

Solving large scale nonlinear optimization problems requires either significant computing resources or the development of specialized algorithms. For Linear Programming (LP) problems, decomposition methods can take advantage of problem structure, gradually constructing the full problem by generating variables or constraints. We first present a direct adaptation of the Column Generation (CG) methodology for nonlinear optimization problems, such that when optimizing over a structured set \({\mathcal {X}}\) plus a moderate number of complicating constraints, we solve a succession of (1) restricted master problems on a smaller set \({\mathcal {S}}\subset {\mathcal {X}}\) and (2) pricing problems that are Lagrangean relaxations wrt the complicating constraints. The former provides feasible solutions and feeds dual information to the latter. In turn, the pricing problem identifies a variable of interest that is then taken into account into an updated subset \({\mathcal {S}}'\subset {\mathcal {X}}\). Our approach is valid whenever the master problem has zero Lagrangean duality gap wrt to the complicating constraints, and not only when \({\mathcal {S}}\) is the convex hull of the generated variables as in CG for LPs, but also with a variety of subsets such as the conic hull, the linear span, and a special variable aggregation set. We discuss how the structure of \({\mathcal {S}}\) and its update mechanism influence the number of iterations required to reach near-optimality and the difficulty of solving the restricted master problems, and present linearized schemes that alleviate the computational burden of solving the pricing problem. We test our methods on synthetic portfolio optimization instances with up to 5 million variables including nonlinear objective functions and second order cone constraints. We show that some CGs with linearized pricing are 2–3 times faster than solving the complete problem directly and are able to provide solutions within 1% of optimality in 6 h for the larger instances, whereas solving the complete problem runs out of memory.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability statement

The source codes used to generate the computational results of this work are included in the supplementary material of the paper.

Notes

  1. f and g satisfy some convexity properties in Sect. 3.

  2. Strict feasibility is defined in Subsection 1.8.

  3. This point is explored in more detail in Sects. 4 and 5.

  4. More precisely, projections onto the variables x of a polyhedron in an extended space.

  5. Originally used in a Generalized Benders Decomposition context.

  6. In the LP case, the vector of reduced costs is \(c-X^\top \lambda ^k\)(see problem (4) in Subsection 1.5).

  7. The entropic risk measure is shown to be convex in z in [52].

References

  1. Acerbi, C., Simonetti, P.: Portfolio optimization with spectral measures of risk. ar**v preprint cond-mat/0203607 (2002)

  2. Ahmadi, A.A., Dash, S., Hall, G.: Optimization over structured subsets of positive semidefinite matrices via column generation. Discret. Optim. 24, 129–151 (2017)

    MathSciNet  MATH  Google Scholar 

  3. Álvarez, C., Mancilla-David, F., Escalona, P., Angulo, A.: A Bienstock–Zuckerberg-based algorithm for solving a network-flow formulation of the convex hull pricing problem. IEEE Trans. Power Syst. 35(3), 2108–2119 (2019)

    Google Scholar 

  4. Barnhart, C., Johnson, E.L., Nemhauser, G.L., Savelsbergh, M.W.P., Vance, P.H.: Branch-and-price: column generation for solving huge integer programs. Oper. Res. 46(3), 316–329 (1998)

    MathSciNet  MATH  Google Scholar 

  5. Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization. Princeton University Press, Princeton (2009)

    MATH  Google Scholar 

  6. Bergner, M., Caprara, A., Ceselli, A., Furini, F., Lübbecke, M.E., Malaguti, E., Traversi, E.: Automatic Dantzig–Wolfe reformulation of mixed integer programs. Math. Program. 149(1–2), 391–424 (2015)

    MathSciNet  MATH  Google Scholar 

  7. Bertsimas, D., Tsitsiklis, J.N.: Introduction to Linear Optimization, vol. 6. Athena Scientific Belmont, MA (1997)

    Google Scholar 

  8. Bienstock, D., Zuckerberg, M.: A new LP algorithm for precedence constrained production scheduling. Optimization Online pp. 1–33 (2009)

  9. Bonnans, J.F., Gilbert, J.C., Lemaréchal, C., Sagastizábal, C.A.: Numerical Optimization: Theoretical and Practical Aspects. Springer Science & Business Media, Berlin, Germany (2006)

    MATH  Google Scholar 

  10. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    MATH  Google Scholar 

  11. Briant, O., Lemaréchal, C., Meurdesoif, P., Michel, S., Perrot, N., Vanderbeck, F.: Comparison of bundle and classical column generation. Math. Program. 113(2), 299–344 (2008)

    MathSciNet  MATH  Google Scholar 

  12. Burer, S.: On the copositive representation of binary and continuous nonconvex quadratic programs. Math. Program. 120(2), 479–495 (2009)

    MathSciNet  MATH  Google Scholar 

  13. Chicoisne, R., Ordoñez, F., Espinoza, D.: Risk averse shortest paths: a computational study. INFORMS J. Comput. 30(3), 539–553 (2018)

    MathSciNet  MATH  Google Scholar 

  14. Choi, E., Tcha, D.W.: A column generation approach to the heterogeneous fleet vehicle routing problem. Comput. Oper. Res. 34(7), 2080–2095 (2007)

    MATH  Google Scholar 

  15. Chou, C.A., Liang, Z., Chaovalitwongse, W.A., Berger-Wolf, T.Y., DasGupta, B., Sheikh, S., Ashley, M.V., Caballero, I.C.: Column-generation framework of nonlinear similarity model for reconstructing sibling groups. INFORMS J. Comput. 27(1), 35–47 (2015)

    MathSciNet  MATH  Google Scholar 

  16. Dantzig, G.B., Wolfe, P.: The decomposition algorithm for linear programs. Econom. J. Econom. Soc. 767–778 (1961)

  17. Dentcheva, D., Ruszczyński, A.: Portfolio optimization with stochastic dominance constraints. J. Bank. Financ. 30(2), 433–451 (2006)

    MATH  Google Scholar 

  18. Desaulniers, G., Desrosiers, J., Dumas, Y., Marc, S., Rioux, B., Solomon, M.M., Soumis, F.: Crew pairing at air France. Eur. J. Oper. Res. 97(2), 245–259 (1997)

    MATH  Google Scholar 

  19. Dong, H., Anstreicher, K.: Separating doubly nonnegative and completely positive matrices. Math. Program. 137(1–2), 131–153 (2013)

    MathSciNet  MATH  Google Scholar 

  20. Du Merle, O., Villeneuve, D., Desrosiers, J., Hansen, P.: Stabilized column generation. Discret. Math. 194(1–3), 229–237 (1999)

    MathSciNet  MATH  Google Scholar 

  21. Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1–3), 293–318 (1992)

    MathSciNet  MATH  Google Scholar 

  22. Espinoza, D., Moreno, E.: A primal-dual aggregation algorithm for minimizing conditional value-at-risk in linear programs. Comput. Optim. Appl. 59(3), 617–638 (2014)

    MathSciNet  MATH  Google Scholar 

  23. García, R., Marín, A., Patriksson, M.: Column generation algorithms for nonlinear optimization, I: convergence analysis. Optimization 52(2), 171–200 (2003)

    MathSciNet  MATH  Google Scholar 

  24. García, R., Marín, A., Patriksson, M.: Column generation algorithms for nonlinear optimization, II: numerical investigations. Comput. Oper. Res. 38(3), 591–604 (2011)

    MathSciNet  MATH  Google Scholar 

  25. Geoffrion, A.: Generalized benders decomposition. J. Optim. Theory Appl. 10(4), 237–260 (1972)

    MathSciNet  MATH  Google Scholar 

  26. Giles, F.R., Pulleyblank, W.R.: Total dual integrality and integer polyhedra. Linear Algebra Appl. 25, 191–196 (1979)

    MathSciNet  MATH  Google Scholar 

  27. Glover, F.: Surrogate constraint duality in mathematical programming. Oper. Res. 23(3), 434–451 (1975)

    MathSciNet  MATH  Google Scholar 

  28. Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM (JACM) 42(6), 1115–1145 (1995)

    MathSciNet  MATH  Google Scholar 

  29. Gorge, A., Lisser, A., Zorgati, R.: Generating cutting planes for the semidefinite relaxation of quadratic programs. Comput. Oper. Res. 55, 65–75 (2015)

    MathSciNet  MATH  Google Scholar 

  30. Greenberg, H., Pierskalla, W.: Surrogate mathematical programming. Oper. Res. 18(5), 924–939 (1970)

    MathSciNet  MATH  Google Scholar 

  31. Khaniyev, T., Elhedhli, S., Erenay, F.S.: Structure detection in mixed-integer programs. INFORMS J. Comput. 30(3), 570–587 (2018)

    MathSciNet  MATH  Google Scholar 

  32. Kleywegt, A.J., Shapiro, A., Homem-de Mello, T.: The sample average approximation method for stochastic discrete optimization. SIAM J. Optim. 12(2), 479–502 (2002)

    MathSciNet  MATH  Google Scholar 

  33. Krokhmal, P., Palmquist, J., Uryasev, S.: Portfolio optimization with conditional value-at-risk objective and constraints. J. Risk 4, 43–68 (2002)

    Google Scholar 

  34. Levy, H., Markowitz, H.M.: Approximating expected utility by a function of mean and variance. Am. Econ. Rev. 69(3), 308–317 (1979)

    Google Scholar 

  35. Lovász, L.: On the Shannon capacity of a graph. IEEE Trans. Inf. Theory 25(1), 1–7 (1979)

    MathSciNet  MATH  Google Scholar 

  36. Lübbecke, M., Desrosiers, J.: Selected topics in column generation. Oper. Res. 53(6), 1007–1023 (2005)

    MathSciNet  MATH  Google Scholar 

  37. Müller, B., Muñoz, G., Gasse, M., Gleixner, A., Lodi, A., Serrano, F.: On generalized surrogate duality in mixed-integer nonlinear programming. In: International Conference on Integer Programming and Combinatorial Optimization, pp. 322–337. Springer (2020)

  38. Muñoz, G., Espinoza, D., Goycoolea, M., Moreno, E., Queyranne, M., Rivera, O.: A study of the Bienstock–Zuckerberg algorithm: applications in mining and resource constrained project scheduling. Comput. Optim. Appl. 69(2), 501–534 (2018)

    MathSciNet  MATH  Google Scholar 

  39. Murphy, F.H.: A column generation algorithm for nonlinear programming. Math. Program. 5(1), 286–298 (1973)

    MathSciNet  MATH  Google Scholar 

  40. Muts, P., Nowak, I., Hendrix, E.M.: On decomposition and multiobjective-based column and disjunctive cut generation for MINLP. Optim. Eng. 22(3), 1389–1418 (2021)

    MathSciNet  MATH  Google Scholar 

  41. Nesterov, Y., Nemirovskii, A.: Interior-point polynomial algorithms in convex programming. SIAM (1994)

  42. Ni, W., Shu, J., Song, M., Xu, D., Zhang, K.: A branch-and-price algorithm for facility location with general facility cost functions. INFORMS J. Comput. 33(1), 86–104 (2021)

    MathSciNet  MATH  Google Scholar 

  43. Nocedal, J., Wright, S.: Numerical Optimization. Springer Science & Business Media, Berlin, Germany (2006)

    MATH  Google Scholar 

  44. Nowak, I., Breitfeld, N., Hendrix, E.M., Njacheun-Njanzoua, G.: Decomposition-based inner-and outer-refinement algorithms for global optimization. J. Glob. Optim. 72(2), 305–321 (2018)

    MathSciNet  MATH  Google Scholar 

  45. Park, Y.W.: Optimization for l 1-norm error fitting via data aggregation. INFORMS J. Comput. 33(1), 120–142 (2021)

    MathSciNet  MATH  Google Scholar 

  46. Pessoa, A., Sadykov, R., Uchoa, E., Vanderbeck, F.: Automation and combination of linear-programming based stabilization techniques in column generation. INFORMS J. Comput. 30(2), 339–360 (2018)

    MathSciNet  MATH  Google Scholar 

  47. Petra, C.G., Schenk, O., Anitescu, M.: Real-time stochastic optimization of complex energy systems on high-performance computers. Comput. Sci. Eng. 16(5), 32–42 (2014)

    Google Scholar 

  48. Petra, C.G., Schenk, O., Lubin, M., Gärtner, K.: An augmented incomplete factorization approach for computing the Schur complement in stochastic optimization. SIAM J. Sci. Comput. 36(2), C139–C162 (2014)

    MathSciNet  Google Scholar 

  49. Pirnay, H., Lopez-Negrete, R., Biegler, L.: Optimal sensitivity based on IPOPT. Math. Program. Comput. 4(4), 307–331 (2012)

    MathSciNet  MATH  Google Scholar 

  50. Pisinger, W.D., Rasmussen, A.B., Sandvik, R.: Solution of large quadratic knapsack problems through aggressive reduction. INFORMS J. Comput. 19(2), 280–290 (2007)

    MathSciNet  MATH  Google Scholar 

  51. Porumbel, D., Clautiaux, F.: Constraint aggregation in column generation models for resource-constrained covering problems. INFORMS J. Comput. 29(1), 170–184 (2017)

    MathSciNet  MATH  Google Scholar 

  52. Pratt, J.W.: Risk aversion in the small and in the large. Econom. J. Econom. Soc. 32(1/2), 122–136 (1964)

    MATH  Google Scholar 

  53. Ruszczyński, A.: On convergence of an augmented Lagrangian decomposition method for sparse convex optimization. Math. Oper. Res. 20(3), 634–656 (1995)

    MathSciNet  MATH  Google Scholar 

  54. Sadykov, R., Lazarev, A., Shiryaev, V., Stratonnikov, A.: Solving a freight railcar flow problem arising in Russia. In: ATMOS-13th Workshop on Algorithmic Approaches for Transportation Modelling, Optimization, and Systems-2013. Dagstuhl Open Access Series in Informatics (2013)

  55. Sadykov, R., Vanderbeck, F.: Column generation for extended formulations. EURO J. Comput. Optim. 1(1–2), 81–115 (2013)

    MATH  Google Scholar 

  56. Song, Y., Luedtke, J.: An adaptive partition-based approach for solving two-stage stochastic programs with fixed recourse. SIAM J. Optim. 25(3), 1344–1367 (2015)

    MathSciNet  MATH  Google Scholar 

  57. Sponsel, J., Dür, M.: Factorization and cutting planes for completely positive matrices by copositive projection. Math. Program. 143(1–2), 211–229 (2014)

    MathSciNet  MATH  Google Scholar 

  58. Sun, Y., Andersen, M.S., Vandenberghe, L.: Decomposition in conic optimization with partially separable structure. SIAM J. Optim. 24(2), 873–897 (2014)

    MathSciNet  MATH  Google Scholar 

  59. Vandenberghe, L., Andersen, M.S.: Chordal graphs and semidefinite optimization. Found. Trends Optim. 1(4), 241–433 (2015)

    Google Scholar 

  60. Vielma, J.P., Ahmed, S., Nemhauser, G.L.: A lifted linear programming branch-and-bound algorithm for mixed-integer conic quadratic programs. INFORMS J. Comput. 20(3), 438–450 (2008)

    MathSciNet  MATH  Google Scholar 

  61. Von Hohenbalken, B.: Simplicial decomposition in nonlinear programming algorithms. Math. Program. 13(1), 49–68 (1977)

    MathSciNet  MATH  Google Scholar 

  62. Wachter, A., Biegler, L.: On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming. Math. Program. 106(1), 25–57 (2006)

    MathSciNet  MATH  Google Scholar 

  63. Wang, J., Ralphs, T.: Computational experience with hypergraph-based methods for automatic decomposition in discrete optimization. In: International Conference on AI and OR Techniques in Constriant Programming for Combinatorial Optimization Problems, pp. 394–402. Springer (2013)

  64. Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78(1), 29–63 (2019)

    MathSciNet  MATH  Google Scholar 

  65. Zangwill, W.I.: The convex simplex method. Manag. Sci. 14(3), 221–238 (1967)

    MathSciNet  MATH  Google Scholar 

  66. Zheng, Y., Fantuzzi, G., Papachristodoulou, A., Goulart, P., Wynn, A.: Fast admm for semidefinite programs with chordal sparsity. In: 2017 American Control Conference (ACC), pp. 3335–3340. IEEE (2017)

  67. Zheng, Y., Fantuzzi, G., Papachristodoulou, A., Goulart, P., Wynn, A.: Chordal decomposition in operator-splitting methods for sparse semidefinite programs. Math. Program. 180(1), 489–532 (2020)

    MathSciNet  MATH  Google Scholar 

  68. Zheng, Y., Sootla, A., Papachristodoulou, A.: Block factor-width-two matrices and their applications to semidefinite and sum-of-squares optimization. IEEE Transactions on Automatic Control (2022)

Download references

Acknowledgements

The author thanks Victor Bucarey, Bernard Fortz, Bernard Gendron (This paper is dedicated to Bernard Gendron who passed away during July 2022), Gonzalo Muñoz, Fernando Ordóñez, Dana Pizarro, Jupiler™  and two anonymous reviewers for their valuable comments on an early version of this work. Powered@NLHPC: This research was partially supported by the supercomputing infrastructure of the NLHPC (ECM-02).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Renaud Chicoisne.

Ethics declarations

Conflict of interest

The author declares that he has no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Example 4

We necessarily have \(y_2=x_3=x_4=x_5=x_6=0\), \(y_1=1-x_1\) and \(x_2=1\), which turns the SDP constraint into \(x_1\geqslant 0\) and we have \(\omega ({\mathcal {X}})=0\). The conic dual of the latter problem is

$$\begin{aligned} \omega ':=\max _{\lambda }\left\{ -\lambda _1-\lambda _2 : \begin{pmatrix} \lambda _1&{}\lambda _4&{}\lambda _5\\ \lambda _4&{}\lambda _2&{}\lambda _6\\ \lambda _5&{}\lambda _6&{}\lambda _3 \end{pmatrix} \succeq 0,\quad 1-\lambda _2+2\lambda _5=0,\quad \lambda _1=0 \right\} . \end{aligned}$$

We necessarily have \(\lambda _1=\lambda _4=\lambda _5=0\), \(\lambda _2=1\), which turns the SDP constraint into \(\lambda _3\geqslant \lambda _6^2\) and we have \(\omega '=-1<0=\omega ({\mathcal {X}})\), proving that \(P({\mathcal {X}})\) does not satisfy Assumption 1.

Now, by construction \({\mathcal {S}}\subset {\mathcal {X}}\) and \(P({\mathcal {S}})\) is a feasible LP that always satisfies Assumption 1. \(\square\)

Appendix B: Proof of Proposition 1

We prove that some \(({{\tilde{x}}}, {{\tilde{y}}})\) is strictly feasible for \(P({\mathcal {S}})\). Because \(-g({{\bar{x}}},{{\bar{y}}})\in \text {relint }{\mathcal {C}}\) and \({\mathcal {C}}\) is proper, there is \(\rho >0\) such that \({\mathcal {B}}(-g({{\bar{x}}},\bar{y}),\rho )\subset \text {relint }{\mathcal {C}}\). Consider \(({{\hat{x}}},\hat{\theta })\) strictly feasible for \({\mathcal {S}}\) and \(\epsilon >0\) and \({\tilde{x}}\) as follows:

$$\begin{aligned} \epsilon = \min \left\{ 1, \frac{\rho }{L || {{\hat{x}}} - {{\bar{x}}} ||}\right\} ,&\quad {{\tilde{x}}} = {{\bar{x}}} + \epsilon ({{\hat{x}}} - {{\bar{x}}}). \end{aligned}$$

Because \({{\bar{x}}}\in {\mathcal {S}}\) and \(({{\hat{x}}},{{\hat{\theta }}})\) is strictly feasible for \({\mathcal {S}}\), there exists \({{\bar{\theta }}}\) such that \(\phi ({{\bar{x}}}, {{\bar{\theta }}})=0\), \(-\gamma ({{\bar{x}}}, {{\bar{\theta }}})\in {\mathcal {K}}\), \(\phi ({{\hat{x}}}, {{\hat{\theta }}})=0\) and \(-\gamma ({{\hat{x}}}, {{\hat{\theta }}})\in \text {relint }{\mathcal {K}}\). \(\phi\) being linear, defining \({{\tilde{\theta }}} = {{\bar{\theta }}}+\epsilon ({{\hat{\theta }}} - \bar{\theta })\), we immediately have \(\phi ({{\tilde{x}}},{{\tilde{\theta }}})=0\). Further, \(\epsilon >0\), \({\mathcal {K}}\) is proper, \(-\gamma ({{\bar{x}}},\bar{\theta }) \in {\mathcal {K}}\) and \(-\gamma ({{\hat{x}}},{{\hat{\theta }}}) \in \text {relint }{\mathcal {K}}\) so we obtain \(-\epsilon \gamma ({{\hat{x}}},{{\hat{\theta }}})-(1-\epsilon ) \gamma ({{\bar{x}}},\bar{\theta })\in \text {relint }{\mathcal {K}}\). Because \(\gamma\) is \({\mathcal {K}}\)-convex and \(\epsilon \in ]0,1]\) we obtain \(-\gamma ({{\tilde{x}}},{{\tilde{\theta }}})\in \text {relint }{\mathcal {K}}\). We just proved that \(({{\tilde{x}}},{{\tilde{\theta }}})\) is strictly feasible for \({\mathcal {S}}\). We finish by proving that \(({{\tilde{x}}}, {\bar{y}})\) is strictly feasible for \(P({\mathcal {S}})\): By definition we have \(\rho \geqslant L\epsilon || {{\hat{x}}} - {{\bar{x}}} ||=L||{{\tilde{x}}}- {{\bar{x}}}|| =L||({{\tilde{x}}}- {{\bar{x}}};{{\bar{y}}}- {{\bar{y}}})||\). Because g is L-Lipschitz, we obtain that \(\rho \geqslant ||g({{\tilde{x}}},{{\bar{y}}}) - g({{\bar{x}}},{{\bar{y}}})||\), meaning that \(-g({{\tilde{x}}},{{\bar{y}}}) \in {\mathcal {B}}(-g({\bar{x}}, {{\bar{y}}}),\rho )\subset \text {relint }{\mathcal {C}}\). Because \(P({\mathcal {S}})\) is convex, Slater’s condition holds, thus finishing the proof. \(\square\)

Appendix C: Proof of Proposition 2

If \(L({\mathcal {X}},\lambda ^k)\) satisfies the P-property wrt y, the problem in y given some \(x\in {\mathcal {X}}\) is equivalent to the problem in y given \(x=x^k\), i.e. for any \(x\in {\mathcal {X}}\) we have

$$\begin{aligned}&\arg \min \limits _{y}\left\{ f\left( x,y\right) +\left\langle \lambda ^k , g\left( x,y\right) \right\rangle \right\} \\ \nonumber =&\arg \min \limits _{y}\left\{ f\left( x^k,y\right) + \left\langle \lambda ^k , g\left( x^k,y\right) \right\rangle \right\} . \end{aligned}$$
(13)

Because \(P({\mathcal {S}}^k)\) satisfies Assumption 1, \((x^k,y^k)\) is optimal for \(L({\mathcal {S}}^k,\lambda ^k)\), i.e.

$$\begin{aligned} \omega ({\mathcal {S}}^k,\lambda ^k)= \min _{x\in {\mathcal {S}}^k,y}\left\{ f\left( x,y\right) +\left\langle \lambda ^k , g\left( x,y\right) \right\rangle \right\} =f\left( x^k,y^k\right) +\left\langle \lambda ^k , g\left( x^k,y^k\right) \right\rangle . \end{aligned}$$

In consequence, \(y^k\in \arg \min _{y}\{f(x^k,y)+\langle \lambda ^k , g(x^k,y)\rangle \}\). Using equality (13) finishes the proof. \(\square\)

Appendix D: Proof of Lemma 1

First, for any \(t\in [0,1]\) and any pair \((u^1,u^2)\in {\mathcal {U}}\times {\mathcal {U}}\) we have: \(t\varphi (u^1) +(1-t)\varphi (u^2) -\varphi (t u^1+(1-t)u^2) \in {\mathcal {K}}\). Given that \(\lambda \in {\mathcal {K}}^*\) we have \(\langle \lambda , t\varphi (u^1) +(1-t)\varphi (u^2) -\varphi (t u^1+(1-t)u^2) \rangle \geqslant 0\), meaning that

$$\begin{aligned}&t \underbrace{ \left\langle \lambda , \varphi \left( u^1\right) \right\rangle }_{ \psi \left( u^1\right) } +(1-t) \underbrace{ \left\langle \lambda , \varphi \left( u^2\right) \right\rangle }_{ \psi \left( u^2\right) } \geqslant \underbrace{ \left\langle \lambda , \varphi \left( t u^1+(1-t)u^2\right) \right\rangle }_{ \psi \left( t u^1+(1-t)u^2\right) } \quad .\square \end{aligned}$$

Appendix E: Proof of Lemma 2

The linear approximation of \(\psi\) at \({{\bar{u}}}\) is \({{\bar{\psi }}}[\bar{u}](u):= \psi ({{\bar{u}}})+ \langle \nabla \psi [{{\bar{u}}} ],u-{{\bar{u}}}\rangle\). By definition, for every \(u\in {\mathcal {U}}\):

$$\begin{aligned} \left\langle \nabla \psi \left[ {{\bar{u}}} \right] ,u-{{\bar{u}}}\right\rangle =&\lim \limits _{\epsilon \rightarrow 0} \frac{\psi \left( {{\bar{u}}} +\epsilon \left( u-{{\bar{u}}}\right) \right) -\psi \left( {{\bar{u}}}\right) }{\epsilon }\\ =&\lim \limits _{\epsilon \rightarrow 0} \frac{ \left\langle \lambda , \varphi \left( {{\bar{u}}} +\epsilon \left( u-\bar{u}\right) \right) \right\rangle -\left\langle \lambda , \varphi \left( {{\bar{u}}}\right) \right\rangle }{\epsilon }\\&= \left\langle \lambda , \lim \limits _{\epsilon \rightarrow 0} \frac{ \varphi \left( {{\bar{u}}} +\epsilon \left( u-{{\bar{u}}}\right) \right) -\varphi \left( {{\bar{u}}}\right) }{\epsilon } \right\rangle = \left\langle \lambda , D\varphi \left( {{\bar{u}}} \right) \left( u-{{\bar{u}}}\right) \right\rangle , \end{aligned}$$

implying that \({{\bar{\psi }}}[{{\bar{u}}}](u)= \langle \lambda , \varphi (\bar{u})\rangle + \langle \lambda , D\varphi ({{\bar{u}}} ) (u-{{\bar{u}}}) \rangle =\langle \lambda , {{\bar{\varphi }}}[{{\bar{u}}}](u)\rangle\). \(\square\)

Appendix F: Proof of Lemma 3

By convexity of \({\mathcal {U}}\) and optimality of \(u^*\), for any \(\epsilon \in ]0,1]\) and any \(u\in {\mathcal {U}}\) we have \(\varphi (u^ *)\leqslant \varphi (u^ *+\epsilon (u-u^ *))\), i.e.

$$\begin{aligned} \frac{\varphi \left( u^ *+\epsilon \left( u-u^*\right) \right) -\varphi \left( u^*\right) }{\epsilon }\geqslant 0,&\quad \forall \epsilon \in ]0,1], \forall u \in {\mathcal {U}}, \end{aligned}$$

which implies that when \(\epsilon \rightarrow 0\), for any \(u \in {\mathcal {U}}\) we have \(\left\langle \nabla \varphi \left( u^ *\right) ,u-u^ *\right\rangle \geqslant 0\), i.e.

$$\begin{aligned} \underbrace{ \varphi \left( u^*\right) +\left\langle \nabla \varphi \left( u^ *\right) ,u-u^ *\right\rangle }_{ \bar{\varphi }\left[ u^*\right] (u) } \geqslant \underbrace{ \varphi \left( u^*\right) +\left\langle \nabla \varphi \left( u^ *\right) ,u^*-u^ *\right\rangle }_{ \bar{\varphi }\left[ u^*\right] \left( u^*\right) },&\quad \forall u \in {\mathcal {U}}, \end{aligned}$$

meaning that \(u^*\) is optimal for \(\min _{u\in {\mathcal {U}}} \bar{\varphi }[u^*](u)\) and implying \(\omega ^ *= {{\bar{\omega }}}[u^ *]\). \(\square\)

Appendix G: Proof of Proposition 3

First, notice that the following holds:

$$\begin{aligned} {{\bar{f}}} \left[ x^k,y^k\right] (x,y)=&f\left( x^k,y^k\right) +\left\langle \nabla _x f\left( x^k,y^k\right) ,x-x^k\right\rangle +\left\langle \nabla _y f\left( x^k,y^k\right) ,y-y^k\right\rangle \\ {{\bar{g}}} \left[ x^k,y^k\right] (x,y)=&g\left( x^k,y^k\right) +D_x g\left( x^k,y^k\right) \left( x-x^k\right) +D_y g\left( x^k,y^k\right) \left( y-y^k\right) \end{aligned}$$

In consequence, we have that:

$$\begin{aligned} \nonumber \bar{\omega }\left[ x^{k},y^k\right] \left( {\mathcal {X}},\lambda ^{k}\right) =&f\left( x^k,y^k\right) -\left\langle \nabla _x f\left( x^k,y^k\right) ,x^k\right\rangle -\left\langle \nabla _y f\left( x^k,y^k\right) ,y^k\right\rangle \\ \nonumber&+\left\langle \lambda ^{k} , g\left( x^k,y^k\right) -D_x g\left( x^k,y^k\right) x^k -D_y g\left( x^k,y^k\right) y^k\right\rangle \\ \nonumber&+\min \limits _{x\in {\mathcal {X}}} \left\{ \left\langle \nabla _x f\left( x^k,y^k\right) ,x\right\rangle +\left\langle \lambda ^{k} , D_x g\left( x^k,y^k\right) x\right\rangle \right\} \\&+\min \limits _{y} \left\{ \left\langle \nabla _y f\left( x^k,y^k\right) ,y\right\rangle +\left\langle \lambda ^{k} , D_y g\left( x^k,y^k\right) y\right\rangle \right\} , \end{aligned}$$
(14)

which is separable in x and y. Wlog we can assume that (14) attains its minimum, otherwise the master problem is unbounded and we can stop. Under this assumption, the first order optimality conditions in y for the master problem \(P({\mathcal {S}}^k)\) are \(\nabla _y f(x^k,y^k)+D_y g ( x^k,y^k)^*\lambda ^k =0\), in turn implying that for any y: \(\langle \nabla _y f(x^k,y^k),y\rangle +\langle \lambda ^k,D_y g ( x^k,y^k) y\rangle =0\), meaning that the objective function of (14) is identically zero. \(\square\)

Appendix H: Proof of Proposition 5

Let us define the following subsets of [p]:

$$\begin{aligned} {\mathcal {J}}_+:=\left\{ j\in [p]: \gamma _j<0, \alpha _j>0 \right\} ,&\quad {\mathcal {J}}_-:=\left\{ j\in [p]: \gamma _j>0, \alpha _j<0 \right\} ,\\ {\mathcal {J}}_0:=\left\{ j\in [p]: \gamma _j, \alpha _j\geqslant 0 \right\} ,&\quad {\mathcal {J}}_\mu :=\left\{ j\in [p]: \gamma _j, \alpha _j\leqslant 0 \right\} . \end{aligned}$$

First notice that we can fix beforehand the following variables:

$$\begin{aligned} z^*_j = {\left\{ \begin{array}{ll} \mu _j&{}\text { If } j\in {\mathcal {J}}_\mu \\ 0&{}\text { If } j\in {\mathcal {J}}_0. \end{array}\right. } \end{aligned}$$

Next, for every \(j\in {\mathcal {J}}_-\), we use the change of variable \(z_j\leftarrow \mu _j-z_j\), obtaining the following problem:

$$\begin{aligned} \omega ^*:= \sum \limits _{j\in {\mathcal {J}}_-\cup {\mathcal {J}}_\mu } \gamma _j\mu _j + \min \limits _{z}\quad&\sum \limits _{j\in {\mathcal {J}}_+\cup {\mathcal {J}}_-} {{\hat{\gamma }}}_jz_j \\ \text {s.t.:}\quad&\sum \limits _{j\in {\mathcal {J}}_+\cup {\mathcal {J}}_-} {{\hat{\alpha }}}_jz_j \leqslant {{\hat{\beta }}} \\&z_j\in \left[ 0,\mu _j\right] ,&\forall j \in {\mathcal {J}}_+\cup {\mathcal {J}}_- \end{aligned}$$

where \({{\hat{\beta }}}:= \beta -\sum _{j\in {\mathcal {J}}_-\cup {\mathcal {J}}_\mu } \alpha _j \mu _j\) and:

$$\begin{aligned} {{\hat{\alpha }}}_j:= {\left\{ \begin{array}{ll} \alpha _j&{} \text { if } j\in {\mathcal {J}}_+\\ -\alpha _j&{} \text { if } j\in {\mathcal {J}}_- \end{array}\right. }{} & {\quad\quad\quad} {{\hat{\gamma }}}_j:= {\left\{ \begin{array}{ll} \gamma _j &{}\text { if } j\in {\mathcal {J}}_+\\ -\gamma _j&{}\text { if } j\in {\mathcal {J}}_-. \end{array}\right. } \end{aligned}$$

The latter is a knapsack problem with positive capacity \({{\hat{\beta }}}\) and weights \({{\hat{\alpha }}}_j\) that can be solved by sorting the remaining indices \(j\in {\mathcal {J}}_+\cup {\mathcal {J}}_-\) in increasing disutility \({{\hat{\gamma }}}_j/{{\hat{\alpha }}}_j\) and filling the capacity constraint until no variable is available or the capacity constraint is tight. \(\square\)

Appendix I: Proof of Proposition 6

It is enough to show that \((u^*,( \pi ^*,\hat{\lambda },{\hat{\lambda }}_0))\) satisfies the KKT conditions for (9):

$$\begin{aligned} \phi \left( u^*\right) \leqslant 0,\quad -\left( \gamma \left( u^*\right) ,\gamma _0\right) \in {\mathcal {L}}_2 \end{aligned}$$
(15a)
$$\begin{aligned} \pi ^*\geqslant 0,\quad \phi \left( u^*\right) ^\top \pi ^*= 0 \end{aligned}$$
(15b)
$$\begin{aligned} \left( {{\hat{\lambda }}},{\hat{\lambda }}_0\right) \in {\mathcal {L}}_2 \end{aligned}$$
(15c)
$$\begin{aligned} \gamma \left( u^*\right) ^\top \hat{\lambda }+\gamma _0 {{\hat{\lambda }}}_0 = 0 \end{aligned}$$
(15d)
$$\begin{aligned} \nabla \varphi \left( u^*\right) + D\phi \left( u^*\right) ^\top \pi ^*+ D\gamma \left( u^*\right) ^\top {{\hat{\lambda }}} = 0. \end{aligned}$$
(15e)

(15a) and (15b) are trivially satisfied. Given that \(-\gamma _0,\lambda ^*\geqslant 0\), the remaining conditions are equivalent to:

$$\begin{aligned} \left( \lambda ^*\right) ^2\left( \left| \left| \gamma \left( u^*\right) \right| \right| _2^2 -\gamma ^2_0\right) \leqslant 0\\ \lambda ^*\left( \left| \left| \gamma \left( u^*\right) \right| \right| _2^2 -\gamma ^2_0\right) = 0\\ \nabla \varphi \left( u^*\right) + D\phi \left( u^*\right) ^\top \pi ^*+ 2\lambda ^*D\gamma \left( u^*\right) ^\top \gamma \left( u^*\right) = 0, \end{aligned}$$

which are all implied by the KKT conditions for (10) at \((u^*,(\pi ^*,\lambda ^*))\). \(\square\)

Appendix J: Sparsity patterns for the enhanced models

Primal-dual interior point and Newton step Given a barrier parameter \(\mu >0\) and an optimization problem \(\min _{u}\{\varphi (u):\phi (u)\leqslant 0,\,Ru=s\}\), the barrier problem is defined as \(\min _{u}\{\varphi (u)-\mu \sum _{i=1}^p\ln (-\phi _i(u)):Ru=s\}\). The first order optimality conditions for the barrier problem are the following perturbed conditions:

$$\begin{aligned} \nabla \varphi (u)+R^\top \pi +\beta ^\top D\phi (u)=0\\ \mu +\beta _i\phi _i(u)=0\\ Ru=s. \end{aligned}$$

The heavy works at each iteration of a primal-dual interior point algorithm occur during the Newton step, which consists in approximately solve the latter system of equations by solving in \((\Delta u,\Delta \beta , \Delta \pi )\) its following first-order approximation:

$$\begin{aligned} \begin{pmatrix} \!\nabla ^2 \varphi (u) \!+\!\sum \limits _{i=1}^p\beta _i\nabla ^2\phi (u) &{}\!\!\!\!D\phi (u)^\top &{}\!\!R^\top \\ \!\!\!\!\!\!-\text {diag}(\beta )D\phi (u) &{}\!\!\!-\text {diag}(\phi (u)) &{}\!\!\!0\\ \!\!R&{}\!\!\!\!\!\!\!0&{}\!\!\!0 \end{pmatrix} \begin{pmatrix} \Delta u\\ \Delta \beta \\ \Delta \pi \end{pmatrix} \!+\! \begin{pmatrix} \nabla \varphi (u)\!+\!R^\top \pi \!+\!\beta ^\top D\phi (u)\\ \left( \mu \!+\!\beta _i\phi _i(u)\right) _{i\in [p]}\\ Ru-s \end{pmatrix} \!=\!0 \end{aligned}$$

Monolithic formulation After the sparsity patterns in Figs. 8, 9, Figure 10 summarizes the number of nonzero coefficients present in the Newton step’s system.

Fig. 8
figure 8

Number of nonzeroes per block: Non-enhanced model vs. enhanced

The non-enhanced version has \(n^2+10n+T+2\) nonzeroes, with \(3n+T+2\) unknowns, while the enhanced version sums \(n(4S+8)+S(S(T+1)+2T+7)+2+T\) nonzeroes, with \(3n+2ST+4S+T+2\) unknowns. We summarize these results in Table 10 to illustrate the benefits of using the enhanced version instead of the original version (recall that we used \(S=20\)):

Table 10 Benefits of enhancing the master problem on our instances

We can see that we can divide at least by a factor 100 the number of nonzero coefficients in our instances at the cost of having less than a percent of additional unknowns.

Fig. 9
figure 9

Sparsity pattern of the Newton step’s matrix for P without numerical enhancement

Fig. 10
figure 10

Sparsity pattern of the Newton step’s matrix for P with numerical enhancement

Pricing problems We only cover here the pricing problem when the variance constraint is considered as conic because the pricing when considering it as a single nonlinear constraint becomes very similar to the monolithic formulation. Given that the problem is splittable in each \(x^t\) (and \(x^0=y\)), we successively solve the pricing for each \(t\in \{0,...,T\}\), which is the problem for which we study the sparsity pattern. As per Fig. 11, Fig. 12 summarizes the number of nonzero coefficients present in the Newton step’s system.

Fig. 11
figure 11

Number of nonzeroes per block: Non-enhanced model vs. enhanced

The non-enhanced version has \(n_t^2+8n_t+1\) nonzeroes, with \(3n_t+1\) unknowns, while the enhanced version sums \(n_t(8+2S)+S(S+2)+1\) nonzeroes, with \(3n_t+2S+1\) unknowns. We summarize these results in Table 11 to illustrate the benefits of using the enhanced version instead of the original version, where we can see again that we can divide at least by a factor 100 the number of nonzero coefficients in our instances at the cost of having less than a percent of additional unknowns.

Fig. 12
figure 12

Sparsity patterns for the t-th pricing problem

Table 11 Benefits of enhancing the pricing problems on our instances

Appendix K: Proof of Proposition 7

The KKT conditions for (12) are

$$\begin{aligned} \phi \left( {{\tilde{u}}}\right) \leqslant 0,\quad -{{\tilde{\gamma }}}\left( {{\tilde{w}}}\right) \in {\mathcal {K}},\quad {{\tilde{v}}}=V{{\tilde{u}}},\quad {{\tilde{w}}}= W{{\tilde{u}}} \end{aligned}$$
(16a)
$$\begin{aligned} {{\tilde{\pi }}}\geqslant 0,\quad {{\tilde{\lambda }}}\in {\mathcal {K}}^*\end{aligned}$$
(16b)
$$\begin{aligned} \phi \left( {{\tilde{u}}}\right) ^\top {{\tilde{\pi }}} = 0,\quad \left\langle {{\tilde{\gamma }}}\left( {{\tilde{w}}}\right) ,{{\tilde{\lambda }}}\right\rangle = 0 \end{aligned}$$
(16c)
$$\begin{aligned} \nabla {{\tilde{\varphi }}}\left( \tilde{v}\right) - {{\tilde{\alpha }}} = 0,\quad D{{\tilde{\gamma }}}\left( {{\tilde{w}}}\right) ^*{{\tilde{\lambda }}} - {{\tilde{\beta }}} = 0 \end{aligned}$$
(16d)
$$\begin{aligned} D\phi \left( {{\tilde{u}}}\right) ^\top \tilde{\pi }+V^*{{\tilde{\alpha }}} +W^*{{\tilde{\beta }}} = 0. \end{aligned}$$
(16e)

Conditions (16a)–(16b)–(16c) represent respectively primal and dual feasibility and complementary slackness for (11) at \(({{\tilde{u}}},(\tilde{\pi },{{\tilde{\lambda }}}))\). We now prove that (16d)–(16e) imply the last remaining KKT condition for (11): stationarity. Replacing (16d) in (16e) we obtain:

$$\begin{aligned} D\phi \left( {{\tilde{u}}}\right) ^\top \tilde{\pi }+V^*\nabla {{\tilde{\varphi }}}\left( {{\tilde{v}}}\right) +W^*D{{\tilde{\gamma }}}\left( {{\tilde{w}}}\right) ^*{{\tilde{\lambda }}} = 0. \end{aligned}$$
(17)

For any (uvw) such that \(Vu=v\) and \(Wu = w\), we have \(\varphi (u) = {{\tilde{\varphi }}}(Vu)\) and \(\gamma (u) = {{\tilde{\gamma }}}(Wu)\), implying that

$$\begin{aligned} \nabla \varphi (u)&= V^*\nabla {{\tilde{\varphi }}} \left( Vu\right) =V^*\nabla {{\tilde{\varphi }}} \left( v\right) \end{aligned}$$
(18a)
$$\begin{aligned} D\gamma (u)&= D{{\tilde{\gamma }}}\left( Wu\right) W = D\tilde{\gamma }\left( w\right) W. \end{aligned}$$
(18b)

Using (18) at \((u,v,w)=(\tilde{u},{{\tilde{v}}},{{\tilde{w}}})\) and replacing in (17) we get \(D\phi ({{\tilde{u}}})^\top {{\tilde{\pi }}} +\nabla \varphi ({{\tilde{u}}}) +D\gamma ({{\tilde{u}}})^*{{\tilde{\lambda }}} = 0\). \(\square\)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chicoisne, R. Computational aspects of column generation for nonlinear and conic optimization: classical and linearized schemes. Comput Optim Appl 84, 789–831 (2023). https://doi.org/10.1007/s10589-022-00445-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-022-00445-0

Keywords

Navigation