Log in

Learning heuristics for weighted CSPs through deep reinforcement learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Weighted constraint satisfaction problems (WCSPs) are one of the most important constraint programming models aiming to find a cost-minimal solution. Due to its NP-hardness, solving a WCSP usually requires efficient heuristics to explore high-quality solutions. Unfortunately, such heuristics are hand-crafted and may not be generalizable across different cases. On the other hand, although Deep Learning (DL) has been proven to be a promising way to learn heuristics for combinatorial optimization problems, the existing DL-based methods are unsuitable for WCSPs since they fail to exploit the problem structure of WCSPs. Besides, such methods are often based on Supervised Learning (SL), making the learned heuristics less efficient since it is challenging to generate a sufficiently large training corpus. Therefore, we propose a novel Deep Reinforcement Learning (DRL) framework to train the model on large-scale problems, so that the model could mine more sophisticated patterns and provide high-quality heuristics. By exploiting the problem structure, we effectively decompose the problem by using a pseudo tree, and formulate the solution construction process as a Markov decision process with multiple independent transition states. With Graph Attention Networks (GATs) parameterized deep Q-value network, we learn the optimal Q-values through a modified Bellman equation that considers the multiple transition states, and extract the solution construction heuristics from the trained network. Besides constructing solutions greedily, our heuristics can also be applied to many meta-heuristics such as beam search and large neighborhood search. The experiments show that our DRL-boosted algorithms significantly outperform the counterparts with their heuristics derived from the SL model, their counterparts with traditional tabular-based heuristics and state-of-the-art methods on benchmark problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. We use a depth-first traversal with a max-degree heuristic to build a pseudo tree, where the ties are broken in alphabetical order.

  2. Our DRL model is available at https://github.com/czy920/DRL4WCSPhttps://github.com/czy920/DRL4WCSP.

References

  1. Allouche D, Givry Sd, Katsirelos G, Schiex T, Zytnicki M (2015) Anytime hybrid best-first search with tree decomposition for weighted CSP. In: CP, pp 12–29. Springer

  2. Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286 (5439):509–512

    Article  MathSciNet  MATH  Google Scholar 

  3. Bellman R (1957) A markovian decision process. Journal of Mathematics and Mechanics, 679–684

  4. Bengio Y, Lodi A, Prouvost A (2021) Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur J Oper Res 290(2):405–421

    Article  MathSciNet  MATH  Google Scholar 

  5. Cappart Q, Moisan T, Rousseau LM, Prémont-Schwarz I, Cire AA (2021) Combining reinforcement learning and constraint programming for combinatorial optimization. In: AAAI, vol 35, pp 3677–3687

  6. Chalumeau F, Coulon I, Cappart Q, Rousseau LM (2021) Seapearl: a constraint programming solver guided by reinforcement learning. In: CPAIOR, pp 392–409. Springer

  7. Chen Z, Zhang W, Deng Y, Chen D, Li Q (2020) RMB-DPOP: refining MB-DPOP by reducing redundant inference. In: AAMAS, pp 249–257

  8. Cicirello VA (2007) On the design of an adaptive simulated annealing algorithm. In: CP Workshop on autonomous search

  9. Clevert DA, Unterthiner T, Hochreiter S (2016) Fast and accurate deep network learning by exponential linear units (ELUS). In: ICLR

  10. Cohen L, Galiki R, Zivan R (2020) Governing convergence of Max-sum on DCOPs through dam** and splitting. Artif Intell 279:103212

    Article  MathSciNet  MATH  Google Scholar 

  11. De Givry S, Heras F, Zytnicki M, Larrosa J (2005) Existential arc consistency: getting closer to full arc consistency in weighted CSPs. In: IJCAI, vol 5, pp 84–89

  12. Dechter R (1999) Bucket elimination: a unifying framework for reasoning. Artif Intell 113 (1-2):41–85

    Article  MathSciNet  MATH  Google Scholar 

  13. Dechter R, Cohen D, et al. (2003) Constraint processing. Morgan Kaufmann

  14. Dechter R, Rish I (2003) Mini-buckets: a general scheme for bounded inference. J ACM (JACM) 50(2):107–153

    Article  MathSciNet  MATH  Google Scholar 

  15. Deng Y, Kong S, An B (2022) Pretrained cost model for distributed constraint optimization problems. In: AAAI

  16. Deng Y, Yu R, Wang X, An B (2021) Neural regret-matching for distributed constraint optimization problems. In: IJCAI

  17. Farinelli A, Rogers A, Jennings NR (2014) Agent-based decentralised coordination for sensor networks using the max-sum algorithm. Auton Agent Multi-Agent Syst 28(3):337–380

    Article  Google Scholar 

  18. Farinelli A, Rogers A, Petcu A, Jennings NR (2008) Decentralised coordination of low-power embedded devices using the max-sum algorithm. In: AAMAS, pp 639–646

  19. Fey M, Lenssen JE (2019) Fast graph representation learning with PyTorch geometric. In: ICLR Workshop on representation learning on graphs and manifolds

  20. Freuder EC, Quinn MJ (1985) Taking advantage of stable sets of variables in constraint satisfaction problems. In: IJCAI, vol 85, pp 1076–1078

  21. Galassi A, Lombardi M, Mello P, Milano M (2018) Model agnostic solution of CSPs via deep learning: a preliminary study. In: CPAIOR, pp 254–262. Springer

  22. Gaudreault J, Frayret JM, Pesant G (2009) Distributed search for supply chain coordination. Comput Ind 60(6):441–451

    Article  Google Scholar 

  23. Givry Sd, Lee JH, Leung KL, Shum YW (2014) Solving a judge assignment problem using conjunctions of global cost functions. In: CP, pp 797–812. Springer

  24. Hart S, Mas-Colell A (2000) A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5):1127–1150

    Article  MathSciNet  MATH  Google Scholar 

  25. Hoang KD, Fioretto F, Yeoh W, Pontelli E, Zivan R (2018) A large neighboring search schema for multi-agent optimization. In: CP, pp 688–706. Springer

  26. Jégou P (1993) Decomposition of domains based on the micro-structure of finite constraint-satisfaction problems. In: AAAI, vol 93, pp 731–736

  27. Jiang Y, Cao Z, Zhang J (2021) Learning to solve 3-D bin packing problem via deep reinforcement learning and constraint programming. IEEE transactions on cybernetics

  28. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: ICLR

  29. Lagoudakis MG, Littman ML (2001) Learning to select branching rules in the DPLL procedure for satisfiability. Electron Notes Discrete Math 9:344–359

    Article  MATH  Google Scholar 

  30. Lagoudakis MG, Littman ML, et al. (2000) Algorithm selection using reinforcement learning. In: ICML, pp 511–518. Citeseer

  31. Larrosa J, Schiex T (2003) In the quest of the best form of local consistency for weighted CSP. In: IJCAI, vol 3, pp 239–244

  32. Lawler EL, Wood DE (1966) Branch-and-bound methods: a survey. Oper Res 14(4):699–719

    Article  MathSciNet  MATH  Google Scholar 

  33. Li G, Muller M, Thabet A, Ghanem B (2019) DeepGCNs: can GCNs go as deep as CNNs?. In: ICCV, pp 9267–9276

  34. Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: ICLR

  35. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  36. Narvekar S, Peng B, Leonetti M, Sinapov J, Taylor ME, Stone P (2020) Curriculum learning for reinforcement learning domains: a framework and survey. J Mach Learn Res 21:181:1–181:50

    MathSciNet  MATH  Google Scholar 

  37. Nguyen DT, Yeoh W, Lau HC, Zivan R (2019) Distributed Gibbs: a linear-space sampling-based DCOP algorithm. J Artif Intell Res 64:705–748

    Article  MathSciNet  MATH  Google Scholar 

  38. Okamoto S, Zivan R, Nahon A, et al. (2016) Distributed breakout: beyond satisfaction. In: IJCAI, pp 447–453

  39. Ottens B, Dimitrakakis C, Faltings B (2012) DUCT: an upper confidence bound approach to distributed constraint optimization problems. ACM Trans Intell Syst Technol 8(5):1–27

    Article  Google Scholar 

  40. Petcu A, Faltings B (2005) DPOP: a scalable method for multiagent constraint optimization. In: IJCAI, pp 266–271

  41. Pisinger D, Ropke S (2010) Handbook of metaheuristics. Springer

  42. Popescu A, Polat-Erdeniz S, Felfernig A, Uta M, Atas M, Le VM, Pilsl K, Enzelsberger M, Tran TNT (2021) An overview of machine learning techniques in constraint solving. Journal of Intelligent Information Systems, 1–28

  43. Razeghi Y, Kask K, Lu Y, Baldi P, Agarwal S, Dechter R (2021) Deep bucket elimination. In: IJCAI, pp 4235–4242

  44. Rust P, Picard G, Ramparany F (2016) Using message-passing DCOP algorithms to solve energy-efficient smart environment configuration problems. In: IJCAI, pp 468–474

  45. Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2009) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80

    Article  Google Scholar 

  46. Schiex T, Fargier H, Verfaillie G, et al. (1995) Valued constraint satisfaction problems: hard and easy problems. In: IJCAI, vol 95, pp 631–639

  47. Selsam D, Bjørner N (2019) Guiding high-performance SAT solvers with unsat-core predictions. In: SAT, pp 336–353. Springer

  48. Selsam D, Lamm M, Benedikt B, Liang P, de Moura L, Dill DL, et al. (2019) Learning a SAT solver from single-bit supervision. In: ICLR

  49. Shapiro SC (1992) Encyclopedia of artificial intelligence, 2nd edn. Wiley-Interscience

  50. Shaw P (1998) Using constraint programming and local search methods to solve vehicle routing problems. In: CP, pp 417–431. Springer

  51. Song W, Cao Z, Zhang J, Xu C, Lim A (2022) Learning variable ordering heuristics for solving constraint satisfaction problems. Eng Appl Artif Intel 109:104603

    Article  Google Scholar 

  52. Strokach A, Becerra D, Corbi-Verge C, Perez-Riba A, Kim PM (2020) Fast and flexible protein design using deep graph neural networks. Cell Syst 11(4):402–411

    Article  Google Scholar 

  53. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press

  54. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: NeurIPS, pp 5998–6008

  55. Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: ICLR

  56. Vinyals M, Shieh E, Cerquides J, Rodriguez-Aguilar JA, Yin Z, Tambe M, Bowring E (2011) Quality guarantees for region optimal DCOP algorithms. In: AAMAS, pp 133–140

  57. Vucinic J, Simoncini D, Ruffini M, Barbe S, Schiex T (2020) Positive multistate protein design. Bioinformatics 36(1):122–130

    Article  Google Scholar 

  58. Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3):229–256

    Article  MATH  Google Scholar 

  59. Xu H, Koenig S, Kumar TS (2018) Towards effective deep learning for constraint satisfaction problems. In: CP, pp 588–597. Springer

  60. Yolcu E, Póczos B (2019) Learning local search heuristics for boolean satisfiability. In: NeurIPS, pp 7990–8001

  61. Zhang W, Sun Z, Zhu Q, Li G, Cai S, **ong Y, Zhang L (2021) NLocalSAT: boosting local search with solution prediction. In: IJCAI, pp 1177–1183

  62. Zhang W, Wang G, **ng Z, Wittenburg L (2005) Distributed stochastic search and distributed breakout: properties, comparison and applications to constraint optimization problems in sensor networks. Artif Intell 161(1-2):55–87

    Article  MathSciNet  MATH  Google Scholar 

  63. Zivan R, Parash T, Cohen L, Peled H, Okamoto S (2017) Balancing exploration and exploitation in incomplete min/max-sum inference for distributed constraint optimization. Auton Agent Multi-Agent Syst 31(5):1165–1207

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ziyu Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Figures 12131415 and 16 present the comparison of the DRL-based and SL-based algorithms on benchmark problems.

Fig. 12
figure 12

Solution quality of the heuristics derived from our DRL model and the SL model on random WCSPs (70 variables)

Fig. 13
figure 13

Solution quality of the heuristics derived from our DRL model and the SL model on random WCSPs (120 variables)

Fig. 14
figure 14

Solution quality of the heuristics derived from our DRL model and the SL model on scale-free problems

Fig. 15
figure 15

Solution quality of the heuristics derived from our DRL model and the SL model on random meeting scheduling problems

Fig. 16
figure 16

Solution quality of the heuristics derived from our DRL model and the SL model on weighted graph coloring problems

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, D., Chen, Z., He, Z. et al. Learning heuristics for weighted CSPs through deep reinforcement learning. Appl Intell 53, 8844–8863 (2023). https://doi.org/10.1007/s10489-022-03992-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03992-5

Keywords

Navigation