Learning heuristics for weighted CSPs through deep reinforcement learning

Chen, Dingding; Chen, Ziyu; He, Zhongshi; Gao, Junsong; Su, Zhizhuo

doi:10.1007/s10489-022-03992-5

Learning heuristics for weighted CSPs through deep reinforcement learning

Published: 03 August 2022

Volume 53, pages 8844–8863, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Dingding Chen¹,
Ziyu Chen¹,
Zhongshi He¹,
Junsong Gao¹ &
…
Zhizhuo Su²

Abstract

Weighted constraint satisfaction problems (WCSPs) are one of the most important constraint programming models aiming to find a cost-minimal solution. Due to its NP-hardness, solving a WCSP usually requires efficient heuristics to explore high-quality solutions. Unfortunately, such heuristics are hand-crafted and may not be generalizable across different cases. On the other hand, although Deep Learning (DL) has been proven to be a promising way to learn heuristics for combinatorial optimization problems, the existing DL-based methods are unsuitable for WCSPs since they fail to exploit the problem structure of WCSPs. Besides, such methods are often based on Supervised Learning (SL), making the learned heuristics less efficient since it is challenging to generate a sufficiently large training corpus. Therefore, we propose a novel Deep Reinforcement Learning (DRL) framework to train the model on large-scale problems, so that the model could mine more sophisticated patterns and provide high-quality heuristics. By exploiting the problem structure, we effectively decompose the problem by using a pseudo tree, and formulate the solution construction process as a Markov decision process with multiple independent transition states. With Graph Attention Networks (GATs) parameterized deep Q-value network, we learn the optimal Q-values through a modified Bellman equation that considers the multiple transition states, and extract the solution construction heuristics from the trained network. Besides constructing solutions greedily, our heuristics can also be applied to many meta-heuristics such as beam search and large neighborhood search. The experiments show that our DRL-boosted algorithms significantly outperform the counterparts with their heuristics derived from the SL model, their counterparts with traditional tabular-based heuristics and state-of-the-art methods on benchmark problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Solving the N-Queens and Golomb Ruler Problems Using DQN and an Approximation of the Convergence

Online Selection of Heuristic Operators with Deep Q-Network: A Study on the HyFlex Framework

Deep Learning for the Generation of Heuristics in Answer Set Programming: A Case Study of Graph Coloring

Notes

We use a depth-first traversal with a max-degree heuristic to build a pseudo tree, where the ties are broken in alphabetical order.
Our DRL model is available at https://github.com/czy920/DRL4WCSP https://github.com/czy920/DRL4WCSP.

References

Allouche D, Givry Sd, Katsirelos G, Schiex T, Zytnicki M (2015) Anytime hybrid best-first search with tree decomposition for weighted CSP. In: CP, pp 12–29. Springer
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286 (5439):509–512
Article MathSciNet MATH Google Scholar
Bellman R (1957) A markovian decision process. Journal of Mathematics and Mechanics, 679–684
Bengio Y, Lodi A, Prouvost A (2021) Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur J Oper Res 290(2):405–421
Article MathSciNet MATH Google Scholar
Cappart Q, Moisan T, Rousseau LM, Prémont-Schwarz I, Cire AA (2021) Combining reinforcement learning and constraint programming for combinatorial optimization. In: AAAI, vol 35, pp 3677–3687
Chalumeau F, Coulon I, Cappart Q, Rousseau LM (2021) Seapearl: a constraint programming solver guided by reinforcement learning. In: CPAIOR, pp 392–409. Springer
Chen Z, Zhang W, Deng Y, Chen D, Li Q (2020) RMB-DPOP: refining MB-DPOP by reducing redundant inference. In: AAMAS, pp 249–257
Cicirello VA (2007) On the design of an adaptive simulated annealing algorithm. In: CP Workshop on autonomous search
Clevert DA, Unterthiner T, Hochreiter S (2016) Fast and accurate deep network learning by exponential linear units (ELUS). In: ICLR
Cohen L, Galiki R, Zivan R (2020) Governing convergence of Max-sum on DCOPs through dam** and splitting. Artif Intell 279:103212
Article MathSciNet MATH Google Scholar
De Givry S, Heras F, Zytnicki M, Larrosa J (2005) Existential arc consistency: getting closer to full arc consistency in weighted CSPs. In: IJCAI, vol 5, pp 84–89
Dechter R (1999) Bucket elimination: a unifying framework for reasoning. Artif Intell 113 (1-2):41–85
Article MathSciNet MATH Google Scholar
Dechter R, Cohen D, et al. (2003) Constraint processing. Morgan Kaufmann
Dechter R, Rish I (2003) Mini-buckets: a general scheme for bounded inference. J ACM (JACM) 50(2):107–153
Article MathSciNet MATH Google Scholar
Deng Y, Kong S, An B (2022) Pretrained cost model for distributed constraint optimization problems. In: AAAI
Deng Y, Yu R, Wang X, An B (2021) Neural regret-matching for distributed constraint optimization problems. In: IJCAI
Farinelli A, Rogers A, Jennings NR (2014) Agent-based decentralised coordination for sensor networks using the max-sum algorithm. Auton Agent Multi-Agent Syst 28(3):337–380
Article Google Scholar
Farinelli A, Rogers A, Petcu A, Jennings NR (2008) Decentralised coordination of low-power embedded devices using the max-sum algorithm. In: AAMAS, pp 639–646
Fey M, Lenssen JE (2019) Fast graph representation learning with PyTorch geometric. In: ICLR Workshop on representation learning on graphs and manifolds
Freuder EC, Quinn MJ (1985) Taking advantage of stable sets of variables in constraint satisfaction problems. In: IJCAI, vol 85, pp 1076–1078
Galassi A, Lombardi M, Mello P, Milano M (2018) Model agnostic solution of CSPs via deep learning: a preliminary study. In: CPAIOR, pp 254–262. Springer
Gaudreault J, Frayret JM, Pesant G (2009) Distributed search for supply chain coordination. Comput Ind 60(6):441–451
Article Google Scholar
Givry Sd, Lee JH, Leung KL, Shum YW (2014) Solving a judge assignment problem using conjunctions of global cost functions. In: CP, pp 797–812. Springer
Hart S, Mas-Colell A (2000) A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5):1127–1150
Article MathSciNet MATH Google Scholar
Hoang KD, Fioretto F, Yeoh W, Pontelli E, Zivan R (2018) A large neighboring search schema for multi-agent optimization. In: CP, pp 688–706. Springer
Jégou P (1993) Decomposition of domains based on the micro-structure of finite constraint-satisfaction problems. In: AAAI, vol 93, pp 731–736
Jiang Y, Cao Z, Zhang J (2021) Learning to solve 3-D bin packing problem via deep reinforcement learning and constraint programming. IEEE transactions on cybernetics
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: ICLR
Lagoudakis MG, Littman ML (2001) Learning to select branching rules in the DPLL procedure for satisfiability. Electron Notes Discrete Math 9:344–359
Article MATH Google Scholar
Lagoudakis MG, Littman ML, et al. (2000) Algorithm selection using reinforcement learning. In: ICML, pp 511–518. Citeseer
Larrosa J, Schiex T (2003) In the quest of the best form of local consistency for weighted CSP. In: IJCAI, vol 3, pp 239–244
Lawler EL, Wood DE (1966) Branch-and-bound methods: a survey. Oper Res 14(4):699–719
Article MathSciNet MATH Google Scholar
Li G, Muller M, Thabet A, Ghanem B (2019) DeepGCNs: can GCNs go as deep as CNNs?. In: ICCV, pp 9267–9276
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: ICLR
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar
Narvekar S, Peng B, Leonetti M, Sinapov J, Taylor ME, Stone P (2020) Curriculum learning for reinforcement learning domains: a framework and survey. J Mach Learn Res 21:181:1–181:50
MathSciNet MATH Google Scholar
Nguyen DT, Yeoh W, Lau HC, Zivan R (2019) Distributed Gibbs: a linear-space sampling-based DCOP algorithm. J Artif Intell Res 64:705–748
Article MathSciNet MATH Google Scholar
Okamoto S, Zivan R, Nahon A, et al. (2016) Distributed breakout: beyond satisfaction. In: IJCAI, pp 447–453
Ottens B, Dimitrakakis C, Faltings B (2012) DUCT: an upper confidence bound approach to distributed constraint optimization problems. ACM Trans Intell Syst Technol 8(5):1–27
Article Google Scholar
Petcu A, Faltings B (2005) DPOP: a scalable method for multiagent constraint optimization. In: IJCAI, pp 266–271
Pisinger D, Ropke S (2010) Handbook of metaheuristics. Springer
Popescu A, Polat-Erdeniz S, Felfernig A, Uta M, Atas M, Le VM, Pilsl K, Enzelsberger M, Tran TNT (2021) An overview of machine learning techniques in constraint solving. Journal of Intelligent Information Systems, 1–28
Razeghi Y, Kask K, Lu Y, Baldi P, Agarwal S, Dechter R (2021) Deep bucket elimination. In: IJCAI, pp 4235–4242
Rust P, Picard G, Ramparany F (2016) Using message-passing DCOP algorithms to solve energy-efficient smart environment configuration problems. In: IJCAI, pp 468–474
Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2009) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80
Article Google Scholar
Schiex T, Fargier H, Verfaillie G, et al. (1995) Valued constraint satisfaction problems: hard and easy problems. In: IJCAI, vol 95, pp 631–639
Selsam D, Bjørner N (2019) Guiding high-performance SAT solvers with unsat-core predictions. In: SAT, pp 336–353. Springer
Selsam D, Lamm M, Benedikt B, Liang P, de Moura L, Dill DL, et al. (2019) Learning a SAT solver from single-bit supervision. In: ICLR
Shapiro SC (1992) Encyclopedia of artificial intelligence, 2nd edn. Wiley-Interscience
Shaw P (1998) Using constraint programming and local search methods to solve vehicle routing problems. In: CP, pp 417–431. Springer
Song W, Cao Z, Zhang J, Xu C, Lim A (2022) Learning variable ordering heuristics for solving constraint satisfaction problems. Eng Appl Artif Intel 109:104603
Article Google Scholar
Strokach A, Becerra D, Corbi-Verge C, Perez-Riba A, Kim PM (2020) Fast and flexible protein design using deep graph neural networks. Cell Syst 11(4):402–411
Article Google Scholar
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: NeurIPS, pp 5998–6008
Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: ICLR
Vinyals M, Shieh E, Cerquides J, Rodriguez-Aguilar JA, Yin Z, Tambe M, Bowring E (2011) Quality guarantees for region optimal DCOP algorithms. In: AAMAS, pp 133–140
Vucinic J, Simoncini D, Ruffini M, Barbe S, Schiex T (2020) Positive multistate protein design. Bioinformatics 36(1):122–130
Article Google Scholar
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3):229–256
Article MATH Google Scholar
Xu H, Koenig S, Kumar TS (2018) Towards effective deep learning for constraint satisfaction problems. In: CP, pp 588–597. Springer
Yolcu E, Póczos B (2019) Learning local search heuristics for boolean satisfiability. In: NeurIPS, pp 7990–8001
Zhang W, Sun Z, Zhu Q, Li G, Cai S, **ong Y, Zhang L (2021) NLocalSAT: boosting local search with solution prediction. In: IJCAI, pp 1177–1183
Zhang W, Wang G, **ng Z, Wittenburg L (2005) Distributed stochastic search and distributed breakout: properties, comparison and applications to constraint optimization problems in sensor networks. Artif Intell 161(1-2):55–87
Article MathSciNet MATH Google Scholar
Zivan R, Parash T, Cohen L, Peled H, Okamoto S (2017) Balancing exploration and exploitation in incomplete min/max-sum inference for distributed constraint optimization. Auton Agent Multi-Agent Syst 31(5):1165–1207
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science, Chongqing University, Chongqing, 400044, China
Dingding Chen, Ziyu Chen, Zhongshi He & Junsong Gao
WMG, University of Warwick, Coventry, CV47AL, UK
Zhizhuo Su

Authors

Dingding Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ziyu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhongshi He
View author publications
You can also search for this author in PubMed Google Scholar
Junsong Gao
View author publications
You can also search for this author in PubMed Google Scholar
Zhizhuo Su
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ziyu Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Figures 12, 13, 14, 15 and 16 present the comparison of the DRL-based and SL-based algorithms on benchmark problems.

About this article

Cite this article

Chen, D., Chen, Z., He, Z. et al. Learning heuristics for weighted CSPs through deep reinforcement learning. Appl Intell 53, 8844–8863 (2023). https://doi.org/10.1007/s10489-022-03992-5

Download citation

Accepted: 10 July 2022
Published: 03 August 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s10489-022-03992-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning heuristics for weighted CSPs through deep reinforcement learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Solving the N-Queens and Golomb Ruler Problems Using DQN and an Approximation of the Convergence

Online Selection of Heuristic Operators with Deep Q-Network: A Study on the HyFlex Framework

Deep Learning for the Generation of Heuristics in Answer Set Programming: A Case Study of Graph Coloring

Notes

References