Computational aspects of column generation for nonlinear and conic optimization: classical and linearized schemes

Chicoisne, Renaud

doi:10.1007/s10589-022-00445-0

Computational aspects of column generation for nonlinear and conic optimization: classical and linearized schemes

Published: 20 January 2023

Volume 84, pages 789–831, (2023)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

Renaud Chicoisne ORCID: orcid.org/0000-0002-5001-4350¹

292 Accesses
1 Altmetric
Explore all metrics

Abstract

Solving large scale nonlinear optimization problems requires either significant computing resources or the development of specialized algorithms. For Linear Programming (LP) problems, decomposition methods can take advantage of problem structure, gradually constructing the full problem by generating variables or constraints. We first present a direct adaptation of the Column Generation (CG) methodology for nonlinear optimization problems, such that when optimizing over a structured set ${\mathcal {X}}$ plus a moderate number of complicating constraints, we solve a succession of (1) restricted master problems on a smaller set ${\mathcal {S}}\subset {\mathcal {X}}$ and (2) pricing problems that are Lagrangean relaxations wrt the complicating constraints. The former provides feasible solutions and feeds dual information to the latter. In turn, the pricing problem identifies a variable of interest that is then taken into account into an updated subset ${\mathcal {S}}'\subset {\mathcal {X}}$. Our approach is valid whenever the master problem has zero Lagrangean duality gap wrt to the complicating constraints, and not only when ${\mathcal {S}}$ is the convex hull of the generated variables as in CG for LPs, but also with a variety of subsets such as the conic hull, the linear span, and a special variable aggregation set. We discuss how the structure of ${\mathcal {S}}$ and its update mechanism influence the number of iterations required to reach near-optimality and the difficulty of solving the restricted master problems, and present linearized schemes that alleviate the computational burden of solving the pricing problem. We test our methods on synthetic portfolio optimization instances with up to 5 million variables including nonlinear objective functions and second order cone constraints. We show that some CGs with linearized pricing are 2–3 times faster than solving the complete problem directly and are able to provide solutions within 1% of optimality in 6 h for the larger instances, whereas solving the complete problem runs out of memory.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large-scale optimization with the primal-dual column generation method

Article 03 September 2015

Combining Column Generation and Metaheuristics

A generic column generation principle: derivation and convergence analysis

Article 08 March 2015

Data availability statement

The source codes used to generate the computational results of this work are included in the supplementary material of the paper.

Notes

f and g satisfy some convexity properties in Sect. 3.
Strict feasibility is defined in Subsection 1.8.
This point is explored in more detail in Sects. 4 and 5.
More precisely, projections onto the variables x of a polyhedron in an extended space.
Originally used in a Generalized Benders Decomposition context.
In the LP case, the vector of reduced costs is $c-X^\top \lambda ^k$(see problem (4) in Subsection 1.5).
The entropic risk measure is shown to be convex in z in [52].

References

Acerbi, C., Simonetti, P.: Portfolio optimization with spectral measures of risk. ar**v preprint cond-mat/0203607 (2002)
Ahmadi, A.A., Dash, S., Hall, G.: Optimization over structured subsets of positive semidefinite matrices via column generation. Discret. Optim. 24, 129–151 (2017)
MathSciNet MATH Google Scholar
Álvarez, C., Mancilla-David, F., Escalona, P., Angulo, A.: A Bienstock–Zuckerberg-based algorithm for solving a network-flow formulation of the convex hull pricing problem. IEEE Trans. Power Syst. 35(3), 2108–2119 (2019)
Google Scholar
Barnhart, C., Johnson, E.L., Nemhauser, G.L., Savelsbergh, M.W.P., Vance, P.H.: Branch-and-price: column generation for solving huge integer programs. Oper. Res. 46(3), 316–329 (1998)
MathSciNet MATH Google Scholar
Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization. Princeton University Press, Princeton (2009)
MATH Google Scholar
Bergner, M., Caprara, A., Ceselli, A., Furini, F., Lübbecke, M.E., Malaguti, E., Traversi, E.: Automatic Dantzig–Wolfe reformulation of mixed integer programs. Math. Program. 149(1–2), 391–424 (2015)
MathSciNet MATH Google Scholar
Bertsimas, D., Tsitsiklis, J.N.: Introduction to Linear Optimization, vol. 6. Athena Scientific Belmont, MA (1997)
Google Scholar
Bienstock, D., Zuckerberg, M.: A new LP algorithm for precedence constrained production scheduling. Optimization Online pp. 1–33 (2009)
Bonnans, J.F., Gilbert, J.C., Lemaréchal, C., Sagastizábal, C.A.: Numerical Optimization: Theoretical and Practical Aspects. Springer Science & Business Media, Berlin, Germany (2006)
MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
MATH Google Scholar
Briant, O., Lemaréchal, C., Meurdesoif, P., Michel, S., Perrot, N., Vanderbeck, F.: Comparison of bundle and classical column generation. Math. Program. 113(2), 299–344 (2008)
MathSciNet MATH Google Scholar
Burer, S.: On the copositive representation of binary and continuous nonconvex quadratic programs. Math. Program. 120(2), 479–495 (2009)
MathSciNet MATH Google Scholar
Chicoisne, R., Ordoñez, F., Espinoza, D.: Risk averse shortest paths: a computational study. INFORMS J. Comput. 30(3), 539–553 (2018)
MathSciNet MATH Google Scholar
Choi, E., Tcha, D.W.: A column generation approach to the heterogeneous fleet vehicle routing problem. Comput. Oper. Res. 34(7), 2080–2095 (2007)
MATH Google Scholar
Chou, C.A., Liang, Z., Chaovalitwongse, W.A., Berger-Wolf, T.Y., DasGupta, B., Sheikh, S., Ashley, M.V., Caballero, I.C.: Column-generation framework of nonlinear similarity model for reconstructing sibling groups. INFORMS J. Comput. 27(1), 35–47 (2015)
MathSciNet MATH Google Scholar
Dantzig, G.B., Wolfe, P.: The decomposition algorithm for linear programs. Econom. J. Econom. Soc. 767–778 (1961)
Dentcheva, D., Ruszczyński, A.: Portfolio optimization with stochastic dominance constraints. J. Bank. Financ. 30(2), 433–451 (2006)
MATH Google Scholar
Desaulniers, G., Desrosiers, J., Dumas, Y., Marc, S., Rioux, B., Solomon, M.M., Soumis, F.: Crew pairing at air France. Eur. J. Oper. Res. 97(2), 245–259 (1997)
MATH Google Scholar
Dong, H., Anstreicher, K.: Separating doubly nonnegative and completely positive matrices. Math. Program. 137(1–2), 131–153 (2013)
MathSciNet MATH Google Scholar
Du Merle, O., Villeneuve, D., Desrosiers, J., Hansen, P.: Stabilized column generation. Discret. Math. 194(1–3), 229–237 (1999)
MathSciNet MATH Google Scholar
Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1–3), 293–318 (1992)
MathSciNet MATH Google Scholar
Espinoza, D., Moreno, E.: A primal-dual aggregation algorithm for minimizing conditional value-at-risk in linear programs. Comput. Optim. Appl. 59(3), 617–638 (2014)
MathSciNet MATH Google Scholar
García, R., Marín, A., Patriksson, M.: Column generation algorithms for nonlinear optimization, I: convergence analysis. Optimization 52(2), 171–200 (2003)
MathSciNet MATH Google Scholar
García, R., Marín, A., Patriksson, M.: Column generation algorithms for nonlinear optimization, II: numerical investigations. Comput. Oper. Res. 38(3), 591–604 (2011)
MathSciNet MATH Google Scholar
Geoffrion, A.: Generalized benders decomposition. J. Optim. Theory Appl. 10(4), 237–260 (1972)
MathSciNet MATH Google Scholar
Giles, F.R., Pulleyblank, W.R.: Total dual integrality and integer polyhedra. Linear Algebra Appl. 25, 191–196 (1979)
MathSciNet MATH Google Scholar
Glover, F.: Surrogate constraint duality in mathematical programming. Oper. Res. 23(3), 434–451 (1975)
MathSciNet MATH Google Scholar
Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM (JACM) 42(6), 1115–1145 (1995)
MathSciNet MATH Google Scholar
Gorge, A., Lisser, A., Zorgati, R.: Generating cutting planes for the semidefinite relaxation of quadratic programs. Comput. Oper. Res. 55, 65–75 (2015)
MathSciNet MATH Google Scholar
Greenberg, H., Pierskalla, W.: Surrogate mathematical programming. Oper. Res. 18(5), 924–939 (1970)
MathSciNet MATH Google Scholar
Khaniyev, T., Elhedhli, S., Erenay, F.S.: Structure detection in mixed-integer programs. INFORMS J. Comput. 30(3), 570–587 (2018)
MathSciNet MATH Google Scholar
Kleywegt, A.J., Shapiro, A., Homem-de Mello, T.: The sample average approximation method for stochastic discrete optimization. SIAM J. Optim. 12(2), 479–502 (2002)
MathSciNet MATH Google Scholar
Krokhmal, P., Palmquist, J., Uryasev, S.: Portfolio optimization with conditional value-at-risk objective and constraints. J. Risk 4, 43–68 (2002)
Google Scholar
Levy, H., Markowitz, H.M.: Approximating expected utility by a function of mean and variance. Am. Econ. Rev. 69(3), 308–317 (1979)
Google Scholar
Lovász, L.: On the Shannon capacity of a graph. IEEE Trans. Inf. Theory 25(1), 1–7 (1979)
MathSciNet MATH Google Scholar
Lübbecke, M., Desrosiers, J.: Selected topics in column generation. Oper. Res. 53(6), 1007–1023 (2005)
MathSciNet MATH Google Scholar
Müller, B., Muñoz, G., Gasse, M., Gleixner, A., Lodi, A., Serrano, F.: On generalized surrogate duality in mixed-integer nonlinear programming. In: International Conference on Integer Programming and Combinatorial Optimization, pp. 322–337. Springer (2020)
Muñoz, G., Espinoza, D., Goycoolea, M., Moreno, E., Queyranne, M., Rivera, O.: A study of the Bienstock–Zuckerberg algorithm: applications in mining and resource constrained project scheduling. Comput. Optim. Appl. 69(2), 501–534 (2018)
MathSciNet MATH Google Scholar
Murphy, F.H.: A column generation algorithm for nonlinear programming. Math. Program. 5(1), 286–298 (1973)
MathSciNet MATH Google Scholar
Muts, P., Nowak, I., Hendrix, E.M.: On decomposition and multiobjective-based column and disjunctive cut generation for MINLP. Optim. Eng. 22(3), 1389–1418 (2021)
MathSciNet MATH Google Scholar
Nesterov, Y., Nemirovskii, A.: Interior-point polynomial algorithms in convex programming. SIAM (1994)
Ni, W., Shu, J., Song, M., Xu, D., Zhang, K.: A branch-and-price algorithm for facility location with general facility cost functions. INFORMS J. Comput. 33(1), 86–104 (2021)
MathSciNet MATH Google Scholar
Nocedal, J., Wright, S.: Numerical Optimization. Springer Science & Business Media, Berlin, Germany (2006)
MATH Google Scholar
Nowak, I., Breitfeld, N., Hendrix, E.M., Njacheun-Njanzoua, G.: Decomposition-based inner-and outer-refinement algorithms for global optimization. J. Glob. Optim. 72(2), 305–321 (2018)
MathSciNet MATH Google Scholar
Park, Y.W.: Optimization for l 1-norm error fitting via data aggregation. INFORMS J. Comput. 33(1), 120–142 (2021)
MathSciNet MATH Google Scholar
Pessoa, A., Sadykov, R., Uchoa, E., Vanderbeck, F.: Automation and combination of linear-programming based stabilization techniques in column generation. INFORMS J. Comput. 30(2), 339–360 (2018)
MathSciNet MATH Google Scholar
Petra, C.G., Schenk, O., Anitescu, M.: Real-time stochastic optimization of complex energy systems on high-performance computers. Comput. Sci. Eng. 16(5), 32–42 (2014)
Google Scholar
Petra, C.G., Schenk, O., Lubin, M., Gärtner, K.: An augmented incomplete factorization approach for computing the Schur complement in stochastic optimization. SIAM J. Sci. Comput. 36(2), C139–C162 (2014)
MathSciNet Google Scholar
Pirnay, H., Lopez-Negrete, R., Biegler, L.: Optimal sensitivity based on IPOPT. Math. Program. Comput. 4(4), 307–331 (2012)
MathSciNet MATH Google Scholar
Pisinger, W.D., Rasmussen, A.B., Sandvik, R.: Solution of large quadratic knapsack problems through aggressive reduction. INFORMS J. Comput. 19(2), 280–290 (2007)
MathSciNet MATH Google Scholar
Porumbel, D., Clautiaux, F.: Constraint aggregation in column generation models for resource-constrained covering problems. INFORMS J. Comput. 29(1), 170–184 (2017)
MathSciNet MATH Google Scholar
Pratt, J.W.: Risk aversion in the small and in the large. Econom. J. Econom. Soc. 32(1/2), 122–136 (1964)
MATH Google Scholar
Ruszczyński, A.: On convergence of an augmented Lagrangian decomposition method for sparse convex optimization. Math. Oper. Res. 20(3), 634–656 (1995)
MathSciNet MATH Google Scholar
Sadykov, R., Lazarev, A., Shiryaev, V., Stratonnikov, A.: Solving a freight railcar flow problem arising in Russia. In: ATMOS-13th Workshop on Algorithmic Approaches for Transportation Modelling, Optimization, and Systems-2013. Dagstuhl Open Access Series in Informatics (2013)
Sadykov, R., Vanderbeck, F.: Column generation for extended formulations. EURO J. Comput. Optim. 1(1–2), 81–115 (2013)
MATH Google Scholar
Song, Y., Luedtke, J.: An adaptive partition-based approach for solving two-stage stochastic programs with fixed recourse. SIAM J. Optim. 25(3), 1344–1367 (2015)
MathSciNet MATH Google Scholar
Sponsel, J., Dür, M.: Factorization and cutting planes for completely positive matrices by copositive projection. Math. Program. 143(1–2), 211–229 (2014)
MathSciNet MATH Google Scholar
Sun, Y., Andersen, M.S., Vandenberghe, L.: Decomposition in conic optimization with partially separable structure. SIAM J. Optim. 24(2), 873–897 (2014)
MathSciNet MATH Google Scholar
Vandenberghe, L., Andersen, M.S.: Chordal graphs and semidefinite optimization. Found. Trends Optim. 1(4), 241–433 (2015)
Google Scholar
Vielma, J.P., Ahmed, S., Nemhauser, G.L.: A lifted linear programming branch-and-bound algorithm for mixed-integer conic quadratic programs. INFORMS J. Comput. 20(3), 438–450 (2008)
MathSciNet MATH Google Scholar
Von Hohenbalken, B.: Simplicial decomposition in nonlinear programming algorithms. Math. Program. 13(1), 49–68 (1977)
MathSciNet MATH Google Scholar
Wachter, A., Biegler, L.: On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming. Math. Program. 106(1), 25–57 (2006)
MathSciNet MATH Google Scholar
Wang, J., Ralphs, T.: Computational experience with hypergraph-based methods for automatic decomposition in discrete optimization. In: International Conference on AI and OR Techniques in Constriant Programming for Combinatorial Optimization Problems, pp. 394–402. Springer (2013)
Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78(1), 29–63 (2019)
MathSciNet MATH Google Scholar
Zangwill, W.I.: The convex simplex method. Manag. Sci. 14(3), 221–238 (1967)
MathSciNet MATH Google Scholar
Zheng, Y., Fantuzzi, G., Papachristodoulou, A., Goulart, P., Wynn, A.: Fast admm for semidefinite programs with chordal sparsity. In: 2017 American Control Conference (ACC), pp. 3335–3340. IEEE (2017)
Zheng, Y., Fantuzzi, G., Papachristodoulou, A., Goulart, P., Wynn, A.: Chordal decomposition in operator-splitting methods for sparse semidefinite programs. Math. Program. 180(1), 489–532 (2020)
MathSciNet MATH Google Scholar
Zheng, Y., Sootla, A., Papachristodoulou, A.: Block factor-width-two matrices and their applications to semidefinite and sum-of-squares optimization. IEEE Transactions on Automatic Control (2022)

Download references

Acknowledgements

The author thanks Victor Bucarey, Bernard Fortz, Bernard Gendron (This paper is dedicated to Bernard Gendron who passed away during July 2022), Gonzalo Muñoz, Fernando Ordóñez, Dana Pizarro, Jupiler™ and two anonymous reviewers for their valuable comments on an early version of this work. Powered@NLHPC: This research was partially supported by the supercomputing infrastructure of the NLHPC (ECM-02).

Author information

Authors and Affiliations

Clermont-Auvergne INP, LIMOS, Université Clermont-Auvergne, 1 rue de la Chebarde, 63170, Aubière, France
Renaud Chicoisne

Authors

Renaud Chicoisne
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Renaud Chicoisne.

Ethics declarations

Conflict of interest

The author declares that he has no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Example 4

We necessarily have $y_2=x_3=x_4=x_5=x_6=0$, $y_1=1-x_1$ and $x_2=1$, which turns the SDP constraint into $x_1\geqslant 0$ and we have $\omega ({\mathcal {X}})=0$. The conic dual of the latter problem is

$$\begin{aligned} \omega ':=\max _{\lambda }\left\{ -\lambda _1-\lambda _2 : \begin{pmatrix} \lambda _1&{}\lambda _4&{}\lambda _5\\ \lambda _4&{}\lambda _2&{}\lambda _6\\ \lambda _5&{}\lambda _6&{}\lambda _3 \end{pmatrix} \succeq 0,\quad 1-\lambda _2+2\lambda _5=0,\quad \lambda _1=0 \right\} . \end{aligned}$$

We necessarily have $\lambda _1=\lambda _4=\lambda _5=0$, $\lambda _2=1$, which turns the SDP constraint into $\lambda _3\geqslant \lambda _6^2$ and we have $\omega '=-1<0=\omega ({\mathcal {X}})$, proving that $P({\mathcal {X}})$ does not satisfy Assumption 1.

Now, by construction ${\mathcal {S}}\subset {\mathcal {X}}$ and $P({\mathcal {S}})$ is a feasible LP that always satisfies Assumption 1. $\square$

Appendix B: Proof of Proposition 1

We prove that some $({{\tilde{x}}}, {{\tilde{y}}})$ is strictly feasible for $P({\mathcal {S}})$. Because $-g({{\bar{x}}},{{\bar{y}}})\in \text {relint }{\mathcal {C}}$ and ${\mathcal {C}}$ is proper, there is $\rho >0$ such that ${\mathcal {B}}(-g({{\bar{x}}},\bar{y}),\rho )\subset \text {relint }{\mathcal {C}}$. Consider $({{\hat{x}}},\hat{\theta })$ strictly feasible for ${\mathcal {S}}$ and $\epsilon >0$ and ${\tilde{x}}$ as follows:

$$\begin{aligned} \epsilon = \min \left\{ 1, \frac{\rho }{L || {{\hat{x}}} - {{\bar{x}}} ||}\right\} ,&\quad {{\tilde{x}}} = {{\bar{x}}} + \epsilon ({{\hat{x}}} - {{\bar{x}}}). \end{aligned}$$

Because ${{\bar{x}}}\in {\mathcal {S}}$ and $({{\hat{x}}},{{\hat{\theta }}})$ is strictly feasible for ${\mathcal {S}}$, there exists ${{\bar{\theta }}}$ such that $\phi ({{\bar{x}}}, {{\bar{\theta }}})=0$, $-\gamma ({{\bar{x}}}, {{\bar{\theta }}})\in {\mathcal {K}}$, $\phi ({{\hat{x}}}, {{\hat{\theta }}})=0$ and $-\gamma ({{\hat{x}}}, {{\hat{\theta }}})\in \text {relint }{\mathcal {K}}$. $\phi$ being linear, defining ${{\tilde{\theta }}} = {{\bar{\theta }}}+\epsilon ({{\hat{\theta }}} - \bar{\theta })$, we immediately have $\phi ({{\tilde{x}}},{{\tilde{\theta }}})=0$. Further, $\epsilon >0$, ${\mathcal {K}}$ is proper, $-\gamma ({{\bar{x}}},\bar{\theta }) \in {\mathcal {K}}$ and $-\gamma ({{\hat{x}}},{{\hat{\theta }}}) \in \text {relint }{\mathcal {K}}$ so we obtain $-\epsilon \gamma ({{\hat{x}}},{{\hat{\theta }}})-(1-\epsilon ) \gamma ({{\bar{x}}},\bar{\theta })\in \text {relint }{\mathcal {K}}$. Because $\gamma$ is ${\mathcal {K}}$-convex and $\epsilon \in ]0,1]$ we obtain $-\gamma ({{\tilde{x}}},{{\tilde{\theta }}})\in \text {relint }{\mathcal {K}}$. We just proved that $({{\tilde{x}}},{{\tilde{\theta }}})$ is strictly feasible for ${\mathcal {S}}$. We finish by proving that $({{\tilde{x}}}, {\bar{y}})$ is strictly feasible for $P({\mathcal {S}})$: By definition we have $\rho \geqslant L\epsilon || {{\hat{x}}} - {{\bar{x}}} ||=L||{{\tilde{x}}}- {{\bar{x}}}|| =L||({{\tilde{x}}}- {{\bar{x}}};{{\bar{y}}}- {{\bar{y}}})||$. Because g is L-Lipschitz, we obtain that $\rho \geqslant ||g({{\tilde{x}}},{{\bar{y}}}) - g({{\bar{x}}},{{\bar{y}}})||$, meaning that $-g({{\tilde{x}}},{{\bar{y}}}) \in {\mathcal {B}}(-g({\bar{x}}, {{\bar{y}}}),\rho )\subset \text {relint }{\mathcal {C}}$. Because $P({\mathcal {S}})$ is convex, Slater’s condition holds, thus finishing the proof. $\square$

Appendix C: Proof of Proposition 2

If $L({\mathcal {X}},\lambda ^k)$ satisfies the P-property wrt y, the problem in y given some $x\in {\mathcal {X}}$ is equivalent to the problem in y given $x=x^k$, i.e. for any $x\in {\mathcal {X}}$ we have

$$\begin{aligned}&\arg \min \limits _{y}\left\{ f\left( x,y\right) +\left\langle \lambda ^k , g\left( x,y\right) \right\rangle \right\} \\ \nonumber =&\arg \min \limits _{y}\left\{ f\left( x^k,y\right) + \left\langle \lambda ^k , g\left( x^k,y\right) \right\rangle \right\} . \end{aligned}$$

(13)

Because $P({\mathcal {S}}^k)$ satisfies Assumption 1, $(x^k,y^k)$ is optimal for $L({\mathcal {S}}^k,\lambda ^k)$, i.e.

$$\begin{aligned} \omega ({\mathcal {S}}^k,\lambda ^k)= \min _{x\in {\mathcal {S}}^k,y}\left\{ f\left( x,y\right) +\left\langle \lambda ^k , g\left( x,y\right) \right\rangle \right\} =f\left( x^k,y^k\right) +\left\langle \lambda ^k , g\left( x^k,y^k\right) \right\rangle . \end{aligned}$$

In consequence, $y^k\in \arg \min _{y}\{f(x^k,y)+\langle \lambda ^k , g(x^k,y)\rangle \}$. Using equality (13) finishes the proof. $\square$

Appendix D: Proof of Lemma 1

First, for any $t\in [0,1]$ and any pair $(u^1,u^2)\in {\mathcal {U}}\times {\mathcal {U}}$ we have: $t\varphi (u^1) +(1-t)\varphi (u^2) -\varphi (t u^1+(1-t)u^2) \in {\mathcal {K}}$. Given that $\lambda \in {\mathcal {K}}^*$ we have $\langle \lambda , t\varphi (u^1) +(1-t)\varphi (u^2) -\varphi (t u^1+(1-t)u^2) \rangle \geqslant 0$, meaning that

$$\begin{aligned}&t \underbrace{ \left\langle \lambda , \varphi \left( u^1\right) \right\rangle }_{ \psi \left( u^1\right) } +(1-t) \underbrace{ \left\langle \lambda , \varphi \left( u^2\right) \right\rangle }_{ \psi \left( u^2\right) } \geqslant \underbrace{ \left\langle \lambda , \varphi \left( t u^1+(1-t)u^2\right) \right\rangle }_{ \psi \left( t u^1+(1-t)u^2\right) } \quad .\square \end{aligned}$$

Appendix E: Proof of Lemma 2

The linear approximation of $\psi$ at ${{\bar{u}}}$ is ${{\bar{\psi }}}[\bar{u}](u):= \psi ({{\bar{u}}})+ \langle \nabla \psi [{{\bar{u}}} ],u-{{\bar{u}}}\rangle$. By definition, for every $u\in {\mathcal {U}}$:

$$\begin{aligned} \left\langle \nabla \psi \left[ {{\bar{u}}} \right] ,u-{{\bar{u}}}\right\rangle =&\lim \limits _{\epsilon \rightarrow 0} \frac{\psi \left( {{\bar{u}}} +\epsilon \left( u-{{\bar{u}}}\right) \right) -\psi \left( {{\bar{u}}}\right) }{\epsilon }\\ =&\lim \limits _{\epsilon \rightarrow 0} \frac{ \left\langle \lambda , \varphi \left( {{\bar{u}}} +\epsilon \left( u-\bar{u}\right) \right) \right\rangle -\left\langle \lambda , \varphi \left( {{\bar{u}}}\right) \right\rangle }{\epsilon }\\&= \left\langle \lambda , \lim \limits _{\epsilon \rightarrow 0} \frac{ \varphi \left( {{\bar{u}}} +\epsilon \left( u-{{\bar{u}}}\right) \right) -\varphi \left( {{\bar{u}}}\right) }{\epsilon } \right\rangle = \left\langle \lambda , D\varphi \left( {{\bar{u}}} \right) \left( u-{{\bar{u}}}\right) \right\rangle , \end{aligned}$$

implying that ${{\bar{\psi }}}[{{\bar{u}}}](u)= \langle \lambda , \varphi (\bar{u})\rangle + \langle \lambda , D\varphi ({{\bar{u}}} ) (u-{{\bar{u}}}) \rangle =\langle \lambda , {{\bar{\varphi }}}[{{\bar{u}}}](u)\rangle$. $\square$

Appendix F: Proof of Lemma 3

By convexity of ${\mathcal {U}}$ and optimality of $u^*$, for any $\epsilon \in ]0,1]$ and any $u\in {\mathcal {U}}$ we have $\varphi (u^ *)\leqslant \varphi (u^ *+\epsilon (u-u^ *))$, i.e.

$$\begin{aligned} \frac{\varphi \left( u^ *+\epsilon \left( u-u^*\right) \right) -\varphi \left( u^*\right) }{\epsilon }\geqslant 0,&\quad \forall \epsilon \in ]0,1], \forall u \in {\mathcal {U}}, \end{aligned}$$

which implies that when $\epsilon \rightarrow 0$, for any $u \in {\mathcal {U}}$ we have $\left\langle \nabla \varphi \left( u^ *\right) ,u-u^ *\right\rangle \geqslant 0$, i.e.

$$\begin{aligned} \underbrace{ \varphi \left( u^*\right) +\left\langle \nabla \varphi \left( u^ *\right) ,u-u^ *\right\rangle }_{ \bar{\varphi }\left[ u^*\right] (u) } \geqslant \underbrace{ \varphi \left( u^*\right) +\left\langle \nabla \varphi \left( u^ *\right) ,u^*-u^ *\right\rangle }_{ \bar{\varphi }\left[ u^*\right] \left( u^*\right) },&\quad \forall u \in {\mathcal {U}}, \end{aligned}$$

meaning that $u^*$ is optimal for $\min _{u\in {\mathcal {U}}} \bar{\varphi }[u^*](u)$ and implying $\omega ^ *= {{\bar{\omega }}}[u^ *]$. $\square$

Appendix G: Proof of Proposition 3

First, notice that the following holds:

$$\begin{aligned} {{\bar{f}}} \left[ x^k,y^k\right] (x,y)=&f\left( x^k,y^k\right) +\left\langle \nabla _x f\left( x^k,y^k\right) ,x-x^k\right\rangle +\left\langle \nabla _y f\left( x^k,y^k\right) ,y-y^k\right\rangle \\ {{\bar{g}}} \left[ x^k,y^k\right] (x,y)=&g\left( x^k,y^k\right) +D_x g\left( x^k,y^k\right) \left( x-x^k\right) +D_y g\left( x^k,y^k\right) \left( y-y^k\right) \end{aligned}$$

In consequence, we have that:

$$\begin{aligned} \nonumber \bar{\omega }\left[ x^{k},y^k\right] \left( {\mathcal {X}},\lambda ^{k}\right) =&f\left( x^k,y^k\right) -\left\langle \nabla _x f\left( x^k,y^k\right) ,x^k\right\rangle -\left\langle \nabla _y f\left( x^k,y^k\right) ,y^k\right\rangle \\ \nonumber&+\left\langle \lambda ^{k} , g\left( x^k,y^k\right) -D_x g\left( x^k,y^k\right) x^k -D_y g\left( x^k,y^k\right) y^k\right\rangle \\ \nonumber&+\min \limits _{x\in {\mathcal {X}}} \left\{ \left\langle \nabla _x f\left( x^k,y^k\right) ,x\right\rangle +\left\langle \lambda ^{k} , D_x g\left( x^k,y^k\right) x\right\rangle \right\} \\&+\min \limits _{y} \left\{ \left\langle \nabla _y f\left( x^k,y^k\right) ,y\right\rangle +\left\langle \lambda ^{k} , D_y g\left( x^k,y^k\right) y\right\rangle \right\} , \end{aligned}$$

(14)

which is separable in x and y. Wlog we can assume that (14) attains its minimum, otherwise the master problem is unbounded and we can stop. Under this assumption, the first order optimality conditions in y for the master problem $P({\mathcal {S}}^k)$ are $\nabla _y f(x^k,y^k)+D_y g ( x^k,y^k)^*\lambda ^k =0$, in turn implying that for any y: $\langle \nabla _y f(x^k,y^k),y\rangle +\langle \lambda ^k,D_y g ( x^k,y^k) y\rangle =0$, meaning that the objective function of (14) is identically zero. $\square$

Appendix H: Proof of Proposition 5

Let us define the following subsets of [p]:

$$\begin{aligned} {\mathcal {J}}_+:=\left\{ j\in [p]: \gamma _j<0, \alpha _j>0 \right\} ,&\quad {\mathcal {J}}_-:=\left\{ j\in [p]: \gamma _j>0, \alpha _j<0 \right\} ,\\ {\mathcal {J}}_0:=\left\{ j\in [p]: \gamma _j, \alpha _j\geqslant 0 \right\} ,&\quad {\mathcal {J}}_\mu :=\left\{ j\in [p]: \gamma _j, \alpha _j\leqslant 0 \right\} . \end{aligned}$$

First notice that we can fix beforehand the following variables:

$$\begin{aligned} z^*_j = {\left\{ \begin{array}{ll} \mu _j&{}\text { If } j\in {\mathcal {J}}_\mu \\ 0&{}\text { If } j\in {\mathcal {J}}_0. \end{array}\right. } \end{aligned}$$

Next, for every $j\in {\mathcal {J}}_-$, we use the change of variable $z_j\leftarrow \mu _j-z_j$, obtaining the following problem:

$$\begin{aligned} \omega ^*:= \sum \limits _{j\in {\mathcal {J}}_-\cup {\mathcal {J}}_\mu } \gamma _j\mu _j + \min \limits _{z}\quad&\sum \limits _{j\in {\mathcal {J}}_+\cup {\mathcal {J}}_-} {{\hat{\gamma }}}_jz_j \\ \text {s.t.:}\quad&\sum \limits _{j\in {\mathcal {J}}_+\cup {\mathcal {J}}_-} {{\hat{\alpha }}}_jz_j \leqslant {{\hat{\beta }}} \\&z_j\in \left[ 0,\mu _j\right] ,&\forall j \in {\mathcal {J}}_+\cup {\mathcal {J}}_- \end{aligned}$$

where ${{\hat{\beta }}}:= \beta -\sum _{j\in {\mathcal {J}}_-\cup {\mathcal {J}}_\mu } \alpha _j \mu _j$ and:

$$\begin{aligned} {{\hat{\alpha }}}_j:= {\left\{ \begin{array}{ll} \alpha _j&{} \text { if } j\in {\mathcal {J}}_+\\ -\alpha _j&{} \text { if } j\in {\mathcal {J}}_- \end{array}\right. }{} & {\quad\quad\quad} {{\hat{\gamma }}}_j:= {\left\{ \begin{array}{ll} \gamma _j &{}\text { if } j\in {\mathcal {J}}_+\\ -\gamma _j&{}\text { if } j\in {\mathcal {J}}_-. \end{array}\right. } \end{aligned}$$

The latter is a knapsack problem with positive capacity ${{\hat{\beta }}}$ and weights ${{\hat{\alpha }}}_j$ that can be solved by sorting the remaining indices $j\in {\mathcal {J}}_+\cup {\mathcal {J}}_-$ in increasing disutility ${{\hat{\gamma }}}_j/{{\hat{\alpha }}}_j$ and filling the capacity constraint until no variable is available or the capacity constraint is tight. $\square$

Appendix I: Proof of Proposition 6

It is enough to show that $(u^*,( \pi ^*,\hat{\lambda },{\hat{\lambda }}_0))$ satisfies the KKT conditions for (9):

$$\begin{aligned} \phi \left( u^*\right) \leqslant 0,\quad -\left( \gamma \left( u^*\right) ,\gamma _0\right) \in {\mathcal {L}}_2 \end{aligned}$$

(15a)

$$\begin{aligned} \pi ^*\geqslant 0,\quad \phi \left( u^*\right) ^\top \pi ^*= 0 \end{aligned}$$

(15b)

$$\begin{aligned} \left( {{\hat{\lambda }}},{\hat{\lambda }}_0\right) \in {\mathcal {L}}_2 \end{aligned}$$

(15c)

$$\begin{aligned} \gamma \left( u^*\right) ^\top \hat{\lambda }+\gamma _0 {{\hat{\lambda }}}_0 = 0 \end{aligned}$$

(15d)

$$\begin{aligned} \nabla \varphi \left( u^*\right) + D\phi \left( u^*\right) ^\top \pi ^*+ D\gamma \left( u^*\right) ^\top {{\hat{\lambda }}} = 0. \end{aligned}$$

(15e)

(15a) and (15b) are trivially satisfied. Given that $-\gamma _0,\lambda ^*\geqslant 0$, the remaining conditions are equivalent to:

$$\begin{aligned} \left( \lambda ^*\right) ^2\left( \left| \left| \gamma \left( u^*\right) \right| \right| _2^2 -\gamma ^2_0\right) \leqslant 0\\ \lambda ^*\left( \left| \left| \gamma \left( u^*\right) \right| \right| _2^2 -\gamma ^2_0\right) = 0\\ \nabla \varphi \left( u^*\right) + D\phi \left( u^*\right) ^\top \pi ^*+ 2\lambda ^*D\gamma \left( u^*\right) ^\top \gamma \left( u^*\right) = 0, \end{aligned}$$

which are all implied by the KKT conditions for (10) at $(u^*,(\pi ^*,\lambda ^*))$. $\square$

Appendix J: Sparsity patterns for the enhanced models

Primal-dual interior point and Newton step Given a barrier parameter $\mu >0$ and an optimization problem $\min _{u}\{\varphi (u):\phi (u)\leqslant 0,\,Ru=s\}$, the barrier problem is defined as $\min _{u}\{\varphi (u)-\mu \sum _{i=1}^p\ln (-\phi _i(u)):Ru=s\}$. The first order optimality conditions for the barrier problem are the following perturbed conditions:

$$\begin{aligned} \nabla \varphi (u)+R^\top \pi +\beta ^\top D\phi (u)=0\\ \mu +\beta _i\phi _i(u)=0\\ Ru=s. \end{aligned}$$

The heavy works at each iteration of a primal-dual interior point algorithm occur during the Newton step, which consists in approximately solve the latter system of equations by solving in $(\Delta u,\Delta \beta , \Delta \pi )$ its following first-order approximation:

$$\begin{aligned} \begin{pmatrix} \!\nabla ^2 \varphi (u) \!+\!\sum \limits _{i=1}^p\beta _i\nabla ^2\phi (u) &{}\!\!\!\!D\phi (u)^\top &{}\!\!R^\top \\ \!\!\!\!\!\!-\text {diag}(\beta )D\phi (u) &{}\!\!\!-\text {diag}(\phi (u)) &{}\!\!\!0\\ \!\!R&{}\!\!\!\!\!\!\!0&{}\!\!\!0 \end{pmatrix} \begin{pmatrix} \Delta u\\ \Delta \beta \\ \Delta \pi \end{pmatrix} \!+\! \begin{pmatrix} \nabla \varphi (u)\!+\!R^\top \pi \!+\!\beta ^\top D\phi (u)\\ \left( \mu \!+\!\beta _i\phi _i(u)\right) _{i\in [p]}\\ Ru-s \end{pmatrix} \!=\!0 \end{aligned}$$

Monolithic formulation After the sparsity patterns in Figs. 8, 9, Figure 10 summarizes the number of nonzero coefficients present in the Newton step’s system.

The non-enhanced version has $n^2+10n+T+2$ nonzeroes, with $3n+T+2$ unknowns, while the enhanced version sums $n(4S+8)+S(S(T+1)+2T+7)+2+T$ nonzeroes, with $3n+2ST+4S+T+2$ unknowns. We summarize these results in Table 10 to illustrate the benefits of using the enhanced version instead of the original version (recall that we used $S=20$):

Table 10 Benefits of enhancing the master problem on our instances

Full size table

We can see that we can divide at least by a factor 100 the number of nonzero coefficients in our instances at the cost of having less than a percent of additional unknowns.

Pricing problems We only cover here the pricing problem when the variance constraint is considered as conic because the pricing when considering it as a single nonlinear constraint becomes very similar to the monolithic formulation. Given that the problem is splittable in each $x^t$ (and $x^0=y$), we successively solve the pricing for each $t\in \{0,...,T\}$, which is the problem for which we study the sparsity pattern. As per Fig. 11, Fig. 12 summarizes the number of nonzero coefficients present in the Newton step’s system.

The non-enhanced version has $n_t^2+8n_t+1$ nonzeroes, with $3n_t+1$ unknowns, while the enhanced version sums $n_t(8+2S)+S(S+2)+1$ nonzeroes, with $3n_t+2S+1$ unknowns. We summarize these results in Table 11 to illustrate the benefits of using the enhanced version instead of the original version, where we can see again that we can divide at least by a factor 100 the number of nonzero coefficients in our instances at the cost of having less than a percent of additional unknowns.

Table 11 Benefits of enhancing the pricing problems on our instances

Full size table

Appendix K: Proof of Proposition 7

The KKT conditions for (12) are

$$\begin{aligned} \phi \left( {{\tilde{u}}}\right) \leqslant 0,\quad -{{\tilde{\gamma }}}\left( {{\tilde{w}}}\right) \in {\mathcal {K}},\quad {{\tilde{v}}}=V{{\tilde{u}}},\quad {{\tilde{w}}}= W{{\tilde{u}}} \end{aligned}$$

(16a)

$$\begin{aligned} {{\tilde{\pi }}}\geqslant 0,\quad {{\tilde{\lambda }}}\in {\mathcal {K}}^*\end{aligned}$$

(16b)

$$\begin{aligned} \phi \left( {{\tilde{u}}}\right) ^\top {{\tilde{\pi }}} = 0,\quad \left\langle {{\tilde{\gamma }}}\left( {{\tilde{w}}}\right) ,{{\tilde{\lambda }}}\right\rangle = 0 \end{aligned}$$

(16c)

$$\begin{aligned} \nabla {{\tilde{\varphi }}}\left( \tilde{v}\right) - {{\tilde{\alpha }}} = 0,\quad D{{\tilde{\gamma }}}\left( {{\tilde{w}}}\right) ^*{{\tilde{\lambda }}} - {{\tilde{\beta }}} = 0 \end{aligned}$$

(16d)

$$\begin{aligned} D\phi \left( {{\tilde{u}}}\right) ^\top \tilde{\pi }+V^*{{\tilde{\alpha }}} +W^*{{\tilde{\beta }}} = 0. \end{aligned}$$

(16e)

Conditions (16a)–(16b)–(16c) represent respectively primal and dual feasibility and complementary slackness for (11) at $({{\tilde{u}}},(\tilde{\pi },{{\tilde{\lambda }}}))$. We now prove that (16d)–(16e) imply the last remaining KKT condition for (11): stationarity. Replacing (16d) in (16e) we obtain:

$$\begin{aligned} D\phi \left( {{\tilde{u}}}\right) ^\top \tilde{\pi }+V^*\nabla {{\tilde{\varphi }}}\left( {{\tilde{v}}}\right) +W^*D{{\tilde{\gamma }}}\left( {{\tilde{w}}}\right) ^*{{\tilde{\lambda }}} = 0. \end{aligned}$$

(17)

For any (u, v, w) such that $Vu=v$ and $Wu = w$, we have $\varphi (u) = {{\tilde{\varphi }}}(Vu)$ and $\gamma (u) = {{\tilde{\gamma }}}(Wu)$, implying that

$$\begin{aligned} \nabla \varphi (u)&= V^*\nabla {{\tilde{\varphi }}} \left( Vu\right) =V^*\nabla {{\tilde{\varphi }}} \left( v\right) \end{aligned}$$

(18a)

$$\begin{aligned} D\gamma (u)&= D{{\tilde{\gamma }}}\left( Wu\right) W = D\tilde{\gamma }\left( w\right) W. \end{aligned}$$

(18b)

Using (18) at $(u,v,w)=(\tilde{u},{{\tilde{v}}},{{\tilde{w}}})$ and replacing in (17) we get $D\phi ({{\tilde{u}}})^\top {{\tilde{\pi }}} +\nabla \varphi ({{\tilde{u}}}) +D\gamma ({{\tilde{u}}})^*{{\tilde{\lambda }}} = 0$. $\square$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chicoisne, R. Computational aspects of column generation for nonlinear and conic optimization: classical and linearized schemes. Comput Optim Appl 84, 789–831 (2023). https://doi.org/10.1007/s10589-022-00445-0

Download citation

Received: 29 April 2021
Accepted: 15 December 2022
Published: 20 January 2023
Issue Date: April 2023
DOI: https://doi.org/10.1007/s10589-022-00445-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computational aspects of column generation for nonlinear and conic optimization: classical and linearized schemes

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Large-scale optimization with the primal-dual column generation method

Combining Column Generation and Metaheuristics

A generic column generation principle: derivation and convergence analysis

Data availability statement

Notes

References

Acknowledgements