Abstract
Weighted constraint satisfaction problems (WCSPs) are one of the most important constraint programming models aiming to find a cost-minimal solution. Due to its NP-hardness, solving a WCSP usually requires efficient heuristics to explore high-quality solutions. Unfortunately, such heuristics are hand-crafted and may not be generalizable across different cases. On the other hand, although Deep Learning (DL) has been proven to be a promising way to learn heuristics for combinatorial optimization problems, the existing DL-based methods are unsuitable for WCSPs since they fail to exploit the problem structure of WCSPs. Besides, such methods are often based on Supervised Learning (SL), making the learned heuristics less efficient since it is challenging to generate a sufficiently large training corpus. Therefore, we propose a novel Deep Reinforcement Learning (DRL) framework to train the model on large-scale problems, so that the model could mine more sophisticated patterns and provide high-quality heuristics. By exploiting the problem structure, we effectively decompose the problem by using a pseudo tree, and formulate the solution construction process as a Markov decision process with multiple independent transition states. With Graph Attention Networks (GATs) parameterized deep Q-value network, we learn the optimal Q-values through a modified Bellman equation that considers the multiple transition states, and extract the solution construction heuristics from the trained network. Besides constructing solutions greedily, our heuristics can also be applied to many meta-heuristics such as beam search and large neighborhood search. The experiments show that our DRL-boosted algorithms significantly outperform the counterparts with their heuristics derived from the SL model, their counterparts with traditional tabular-based heuristics and state-of-the-art methods on benchmark problems.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-03992-5/MediaObjects/10489_2022_3992_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-03992-5/MediaObjects/10489_2022_3992_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-03992-5/MediaObjects/10489_2022_3992_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-03992-5/MediaObjects/10489_2022_3992_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-03992-5/MediaObjects/10489_2022_3992_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-03992-5/MediaObjects/10489_2022_3992_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-03992-5/MediaObjects/10489_2022_3992_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-03992-5/MediaObjects/10489_2022_3992_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-03992-5/MediaObjects/10489_2022_3992_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-03992-5/MediaObjects/10489_2022_3992_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-03992-5/MediaObjects/10489_2022_3992_Fig11_HTML.png)
Similar content being viewed by others
Notes
We use a depth-first traversal with a max-degree heuristic to build a pseudo tree, where the ties are broken in alphabetical order.
Our DRL model is available at https://github.com/czy920/DRL4WCSPhttps://github.com/czy920/DRL4WCSP.
References
Allouche D, Givry Sd, Katsirelos G, Schiex T, Zytnicki M (2015) Anytime hybrid best-first search with tree decomposition for weighted CSP. In: CP, pp 12–29. Springer
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286 (5439):509–512
Bellman R (1957) A markovian decision process. Journal of Mathematics and Mechanics, 679–684
Bengio Y, Lodi A, Prouvost A (2021) Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur J Oper Res 290(2):405–421
Cappart Q, Moisan T, Rousseau LM, Prémont-Schwarz I, Cire AA (2021) Combining reinforcement learning and constraint programming for combinatorial optimization. In: AAAI, vol 35, pp 3677–3687
Chalumeau F, Coulon I, Cappart Q, Rousseau LM (2021) Seapearl: a constraint programming solver guided by reinforcement learning. In: CPAIOR, pp 392–409. Springer
Chen Z, Zhang W, Deng Y, Chen D, Li Q (2020) RMB-DPOP: refining MB-DPOP by reducing redundant inference. In: AAMAS, pp 249–257
Cicirello VA (2007) On the design of an adaptive simulated annealing algorithm. In: CP Workshop on autonomous search
Clevert DA, Unterthiner T, Hochreiter S (2016) Fast and accurate deep network learning by exponential linear units (ELUS). In: ICLR
Cohen L, Galiki R, Zivan R (2020) Governing convergence of Max-sum on DCOPs through dam** and splitting. Artif Intell 279:103212
De Givry S, Heras F, Zytnicki M, Larrosa J (2005) Existential arc consistency: getting closer to full arc consistency in weighted CSPs. In: IJCAI, vol 5, pp 84–89
Dechter R (1999) Bucket elimination: a unifying framework for reasoning. Artif Intell 113 (1-2):41–85
Dechter R, Cohen D, et al. (2003) Constraint processing. Morgan Kaufmann
Dechter R, Rish I (2003) Mini-buckets: a general scheme for bounded inference. J ACM (JACM) 50(2):107–153
Deng Y, Kong S, An B (2022) Pretrained cost model for distributed constraint optimization problems. In: AAAI
Deng Y, Yu R, Wang X, An B (2021) Neural regret-matching for distributed constraint optimization problems. In: IJCAI
Farinelli A, Rogers A, Jennings NR (2014) Agent-based decentralised coordination for sensor networks using the max-sum algorithm. Auton Agent Multi-Agent Syst 28(3):337–380
Farinelli A, Rogers A, Petcu A, Jennings NR (2008) Decentralised coordination of low-power embedded devices using the max-sum algorithm. In: AAMAS, pp 639–646
Fey M, Lenssen JE (2019) Fast graph representation learning with PyTorch geometric. In: ICLR Workshop on representation learning on graphs and manifolds
Freuder EC, Quinn MJ (1985) Taking advantage of stable sets of variables in constraint satisfaction problems. In: IJCAI, vol 85, pp 1076–1078
Galassi A, Lombardi M, Mello P, Milano M (2018) Model agnostic solution of CSPs via deep learning: a preliminary study. In: CPAIOR, pp 254–262. Springer
Gaudreault J, Frayret JM, Pesant G (2009) Distributed search for supply chain coordination. Comput Ind 60(6):441–451
Givry Sd, Lee JH, Leung KL, Shum YW (2014) Solving a judge assignment problem using conjunctions of global cost functions. In: CP, pp 797–812. Springer
Hart S, Mas-Colell A (2000) A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5):1127–1150
Hoang KD, Fioretto F, Yeoh W, Pontelli E, Zivan R (2018) A large neighboring search schema for multi-agent optimization. In: CP, pp 688–706. Springer
Jégou P (1993) Decomposition of domains based on the micro-structure of finite constraint-satisfaction problems. In: AAAI, vol 93, pp 731–736
Jiang Y, Cao Z, Zhang J (2021) Learning to solve 3-D bin packing problem via deep reinforcement learning and constraint programming. IEEE transactions on cybernetics
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: ICLR
Lagoudakis MG, Littman ML (2001) Learning to select branching rules in the DPLL procedure for satisfiability. Electron Notes Discrete Math 9:344–359
Lagoudakis MG, Littman ML, et al. (2000) Algorithm selection using reinforcement learning. In: ICML, pp 511–518. Citeseer
Larrosa J, Schiex T (2003) In the quest of the best form of local consistency for weighted CSP. In: IJCAI, vol 3, pp 239–244
Lawler EL, Wood DE (1966) Branch-and-bound methods: a survey. Oper Res 14(4):699–719
Li G, Muller M, Thabet A, Ghanem B (2019) DeepGCNs: can GCNs go as deep as CNNs?. In: ICCV, pp 9267–9276
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: ICLR
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Narvekar S, Peng B, Leonetti M, Sinapov J, Taylor ME, Stone P (2020) Curriculum learning for reinforcement learning domains: a framework and survey. J Mach Learn Res 21:181:1–181:50
Nguyen DT, Yeoh W, Lau HC, Zivan R (2019) Distributed Gibbs: a linear-space sampling-based DCOP algorithm. J Artif Intell Res 64:705–748
Okamoto S, Zivan R, Nahon A, et al. (2016) Distributed breakout: beyond satisfaction. In: IJCAI, pp 447–453
Ottens B, Dimitrakakis C, Faltings B (2012) DUCT: an upper confidence bound approach to distributed constraint optimization problems. ACM Trans Intell Syst Technol 8(5):1–27
Petcu A, Faltings B (2005) DPOP: a scalable method for multiagent constraint optimization. In: IJCAI, pp 266–271
Pisinger D, Ropke S (2010) Handbook of metaheuristics. Springer
Popescu A, Polat-Erdeniz S, Felfernig A, Uta M, Atas M, Le VM, Pilsl K, Enzelsberger M, Tran TNT (2021) An overview of machine learning techniques in constraint solving. Journal of Intelligent Information Systems, 1–28
Razeghi Y, Kask K, Lu Y, Baldi P, Agarwal S, Dechter R (2021) Deep bucket elimination. In: IJCAI, pp 4235–4242
Rust P, Picard G, Ramparany F (2016) Using message-passing DCOP algorithms to solve energy-efficient smart environment configuration problems. In: IJCAI, pp 468–474
Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2009) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80
Schiex T, Fargier H, Verfaillie G, et al. (1995) Valued constraint satisfaction problems: hard and easy problems. In: IJCAI, vol 95, pp 631–639
Selsam D, Bjørner N (2019) Guiding high-performance SAT solvers with unsat-core predictions. In: SAT, pp 336–353. Springer
Selsam D, Lamm M, Benedikt B, Liang P, de Moura L, Dill DL, et al. (2019) Learning a SAT solver from single-bit supervision. In: ICLR
Shapiro SC (1992) Encyclopedia of artificial intelligence, 2nd edn. Wiley-Interscience
Shaw P (1998) Using constraint programming and local search methods to solve vehicle routing problems. In: CP, pp 417–431. Springer
Song W, Cao Z, Zhang J, Xu C, Lim A (2022) Learning variable ordering heuristics for solving constraint satisfaction problems. Eng Appl Artif Intel 109:104603
Strokach A, Becerra D, Corbi-Verge C, Perez-Riba A, Kim PM (2020) Fast and flexible protein design using deep graph neural networks. Cell Syst 11(4):402–411
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: NeurIPS, pp 5998–6008
Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: ICLR
Vinyals M, Shieh E, Cerquides J, Rodriguez-Aguilar JA, Yin Z, Tambe M, Bowring E (2011) Quality guarantees for region optimal DCOP algorithms. In: AAMAS, pp 133–140
Vucinic J, Simoncini D, Ruffini M, Barbe S, Schiex T (2020) Positive multistate protein design. Bioinformatics 36(1):122–130
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3):229–256
Xu H, Koenig S, Kumar TS (2018) Towards effective deep learning for constraint satisfaction problems. In: CP, pp 588–597. Springer
Yolcu E, Póczos B (2019) Learning local search heuristics for boolean satisfiability. In: NeurIPS, pp 7990–8001
Zhang W, Sun Z, Zhu Q, Li G, Cai S, **ong Y, Zhang L (2021) NLocalSAT: boosting local search with solution prediction. In: IJCAI, pp 1177–1183
Zhang W, Wang G, **ng Z, Wittenburg L (2005) Distributed stochastic search and distributed breakout: properties, comparison and applications to constraint optimization problems in sensor networks. Artif Intell 161(1-2):55–87
Zivan R, Parash T, Cohen L, Peled H, Okamoto S (2017) Balancing exploration and exploitation in incomplete min/max-sum inference for distributed constraint optimization. Auton Agent Multi-Agent Syst 31(5):1165–1207
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Chen, D., Chen, Z., He, Z. et al. Learning heuristics for weighted CSPs through deep reinforcement learning. Appl Intell 53, 8844–8863 (2023). https://doi.org/10.1007/s10489-022-03992-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03992-5