1 Introduction

Recently, there has been a surge of interest in applying machine learning to combinatorial optimization [1,2,3]. SAT, the most fundamental NP-complete problem in computer science [4], has a wide range of applications, from verification [5] to planning [6] and scheduling [7]. Benedikt Bünz et al. [8] proposed using graph neural networks (GNNs) for learning to solve SAT problems. They encoded SAT problem instances as undirected graphs without changing their rich invariance. The current applications of GNNs to SAT problems can be grouped into two categories. One is to build an end-to-end model that inputs a problem instance and directly predicts satisfiability or obtains a solution. A seminal work is NeuroSAT [9], which uses a message-passing model to learn the satisfiability of random SAT instances and decode each literal vote through an activation function to arrive at the final set of solutions. Minghao Liu et al. [10] further investigated the predicted solution quality of GNNs in learning to solve maximum satisfiability (MaxSAT) problems from both theoretical and practical perspectives. Another class combines neural networks with traditional search frameworks in an attempt to improve the effectiveness of some key heuristics. For example, GNNs have been used to replace the variable selection function in WalkSAT solvers [11], to generate initialized assignments for local search [12] or to guide search by predicting variables in the un-SAT core [13]. These works show that GNN models can learn invariant structural features from SAT problem instances and also have the potential to help improve SAT problem solving techniques in the future.

Graph attention networks [14] is a spatial domain-based graph neural network-based attention mechanism unlike previous spectral domain-based graph convolutional neural networks, which do not require sophisticated computations using Laplace matrices and just aggregate first-order neighbor nodes to update node characteristics. Graph attention networks can learn different weights for different neighbors through the attention mechanism.

Graph attention networks have recently achieved good research results in natural language processing tasks, such as text classification, where the graph attention mechanism can learn word nodes that are discriminative for the class of the text [15]. We also investigate whether the graph attention mechanism can be used to propositional logic problems in the same way as it does in natural language. Propositional logic and natural language are two different types of languages [8]. Propositional logic problems are rich in invariant features that cannot be encoded using traditional RNNs and CNNs, but are highly structured problems like natural language. After different ways of permutation, words in natural language express different meanings. For a news article, it may be that a certain word in its title determines to which category it belongs. In propositional logic, literals and clauses have different logical meanings due to different conjunctions and disjunctions. For a pair of SAT problem examples, the difference between satisfying and not satisfying may be whether or not one literal variable of a single clause contained is inverted. Like natural language processing tasks, propositional logic problems are predicted by a small number of key nodes. A model’s ability to predict the satisfiability of SAT problems depends heavily on whether the model can capture the invariant structural features of certain key nodes.

In this paper, we make the first attempt to solve SAT problems using graph attention networks and present GAT-SAT, a new framework for predicting the satisfiability of SAT problems. There is a big disadvantage to traditional graph attention networks. They are unsuccessful for higher-order neighbor nodes because they only aggregate the information of first-order neighbor nodes. To tackle this difficulty, we use message passing to ensure that each node has some information about its higher-order neighbors. Then the attention scores of literal nodes for clause nodes and clause nodes for literal nodes are computed separately using the graph attention mechanism, which in turn yields a weighted representation of the text vector and clause vector. Finally, the final prediction results are obtained by a multilayer perceptron. We compare with NeruoSAT on the randomly generated SAT problems and the random 3-SAT problems. On the one hand, the input of our model, like that of NeuroSAT, is just an arbitrary number of adjacency matrices of literals and clauses, and does not contain any artificial features. On the other hand, we believe that NeuroSAT is an advanced method that can represent graph neural network approach using the message-passing model. Experiments demonstrate that our model outperforms NeuroSAT in terms of prediction accuracy on both randomly generated SAT cases. In particular, on the random SAT problems, we only need half the epoch of NeuroSAT to achieve the same or even higher accuracy than them. We attempt to explain the role of the graph attention mechanism in the model at the end of this paper. The main contributions and originality of our work are summarized in the following:

  • We use graph attention networks for the first time to solve stochastic SAT problems.

  • To address the drawbacks of traditional graph attention networks in solving SAT problems, we propose a new framework for graph attention networks incorporating message passing, GAT-SAT.

  • We have tried to explain the role played by the graph attention mechanism in solving SAT problems.

2 Background and Related Work

2.1 Background

A SAT problem involves finding variable assignments that satisfy a propositional logic formula or showing that such an assignment does not exist. A formula of propositional logic is a Boolean expression built using the constants true(1) and false(0), variables, negations, conjunctions, and disjunctions. The term literal is used to refer to a variable or its negation. It is convenient to represent Boolean formulas in conjunctive normal form (CNF), i.e., conjunctions of clauses, where a clause is a disjunction of literals. An example of a CNF is (× 1∨¬ × 2)∧(× 2∨¬ × 3), where ∧, ∨, ¬ are conjunctions, disjunction, and negation, respectively. This CNF formula has two clauses: (× 1∨¬ × 2) and (× 2∨¬ × 3). Each conjunct of the formula in CNF is called a clause, and each (possibly negated) variable within a clause is called a literal. A formula in CNF has a satisfying assignment if and only if it has an assignment such that every clause has at least one literal mapped to 1. A SAT problem is a formula in CNF, where the goal is to determine if the formula is satisfiable and, if so, to produce a satisfying assignment of truth values to variables.

2.2 Related Work

Over the past two decades, many researchers have utilized machine learning to make predictions about specific properties of SAT problem instances, such as predicting how long the solver will need to run, the choice of algorithm, and so on [16,17,18]. These methods rely on carefully crafted features that encode aspects of the input SAT instances. Nudelman et al. [19] produced 84 feature sets that can be used to predict solver performance from known heuristics (e.g., the ratio of positive and negative occurrences of clauses and each variable), processable subclasses and other features on problem complexity. These features were subsequently combined with various machine learning models [20], resulting in the random forest model that Xu et al. [21] investigated. Hutter et al. [18] used hand-designed features to predict the run of each instance time; features are also used to build algorithm combinations that can improve performance by choosing different solvers on a per-instance basis [22, 23]. SAT problem cases, on the other hand, are rich in general and particular structural elements that are typically lost throughout most feature extraction approaches.

Due to new advances in the field of representation learning, especially geometric deep learning [24, 25], good progress has been made in predicting SAT problem satisfiability using graph neural networks. Benedikt Bünz et al. [8] define a graph representation for Boolean formulas in conjunctive normal form, and train neural classifiers over general graph structures called Graph Neural Networks to identify satisfiability features. It was experimentally demonstrated that using graph neural structure learning can classify SAT. Selsam et al. [9] proposed a message-passing neural network, NeuroSAT, which can learn to solve SAT problems by simply training as a classifier for predicting satisfiability. And in subsequent work, the authors showed that NeuroSAT could provide effective guidance for high-performance SAT solvers on real-world problems. Chris Cameron et al. [26] encoded SAT problems as sparse matrices. They predicted the satisfiability of stochastic 3-SAT problems by learning end-to-end the structural features of SAT problems, which are still cutting-edge difficulties for solvers. The authors demonstrate that on stochastic 3-SAT tasks, both exchangeable deep learning models and neural message-passing models have superior prediction performance, consistently beating models based on complicated artificial engineering characteristics.

3 Model

The overview of our model, GAT-SAT, is presented in Fig. 1. We encode the SAT problem as an undirected graph G, represented by an adjacency matrix. In the process of message passing, for the literal nodes \({l}_{i}\in \mathrm{G}\) and clause nodes \({c}_{i}\in \mathrm{G}\). We create an embedding for each \({l}_{i}\) and \({c}_{i}\) Each clause receives the message from its adjacent literals and updates its embedding accordingly. Each literal receives a message from its adjacent clauses and complements and updates its embedding accordingly, iteratively updating the embedding of each node's embedding at each time step. At the end of message delivery, GAT-SAT learns the attention weights \({a}_{ij}\) between nodes and then weighted average the feature vectors of literal nodes and clause nodes based on the attention weights, which in turn yields new representations of literal nodes and clause nodes \(l_{i}^{^{\prime}}\) and \(c_{i}^{^{\prime}}\). Finally, pass them into the MLP to get the final vote of each node by the activation function, and calculate the average of the literal votes to get the prediction satisfaction of the problem, which is considered satisfied if it is greater than 0.5 unsatisfied if it is not.

Fig. 1
figure 1

The overview of our model, GAT-SAT

3.1 Formula Graph

As mentioned earlier, the ability of a model to predict the satisfiability of a Boolean formula depends on whether the input formula contains some variable conflict. If a weakly supervised algorithm is to learn to recognize these conflicting patterns, its input must include information about the relationships between the variables in the formula. Considering the structural information of the input formula, we first convert it into a formula graph. A general Boolean formula can be any expression consisting of variables, constants, conjunctions, disjunctions, and negations. All Boolean formulas can be reduced to equally satisfiable collocation paradigms with linear length in linear time [27]. In conjunctive normal form(CNF), the SAT instances are the clauses \({\mathrm{C}}_{1}\)\({\mathrm{C}}_{2}\) ∧ … ∧ \({\mathrm{C}}_{n}\) taken together. Each clause \({\mathrm{C}}_{i}\) = \({l}_{i1}\)\({l}_{i2}\) ∨ … ∨ \({l}_{in}\) is a literal (i.e., a variable and a negation variable) of the conjunction and disjunctions, where \({l}_{ij}\) = \({x}_{k}\) or \({l}_{ij}\) = ¬\({x}_{k}\). In this paper, we encode the SAT problem as an undirected graph, where each literal corresponds to a node, each clause corresponds to a node, there is an edge between each literal and each clause in which it appears, and each pair of complementary literal is between (e.g.\({x}_{i}\) and ¬\({x}_{i}\)) has an edge of a different type, and the element \({M}_{ij}\) is equal to 1 when there is an edge between clause node i and literal node j, otherwise, it is 0. For example, the Boolean formula (\({x}_{1}\)∨¬\({x}_{2}\)\({x}_{3}\)) ∧ ¬ (\({x}_{1}\)∧¬\({x}_{2}\)∧¬\({x}_{4}\)) can be converted to (\({x}_{1}\)∨¬\({x}_{2}\)\({x}_{3}\)) ∧ (¬\({x}_{1}\)\({x}_{2}\)\({x}_{4}\)). The undirected graph of the problem is shown in Fig. 2.

Fig. 2
figure 2

Representation of the Boolean formula in an undirected graph

The adjacency matrix of this graph is:

$$ M = \left( {_{0\,\quad 1\quad 1\quad 0\quad 0\quad 0\quad 1\quad 0}^{1\quad 0\quad 0\quad 1\quad 1\quad 0\quad 0\quad 0} } \right) $$

3.2 Graph Neural Network

Our network consists of three multilayer perceptrons (\({\text{L}}_{msg} ,\) \({\text{C}}_{msg} ,\)\({\text{L}}_{vote} ,\)), two LSTMs (\({\mathrm{L}}_{lstm}\),\({\mathrm{C}}_{lstm}\)), and two multi-headed graph attention network layers (\({L}_{layer}\),\({C}_{layer}\)). An iteration consists of three phases, the first is a message-passing process where each clause receives a message from its neighboring literal and updates its embedding, and the second is each literal receives a message from its neighboring clause and reverse literal and updates its embedding accordingly. After T (in this paper, we take 26) rounds of passing, each node contains information about its multi-order neighbor nodes. Then it is further fed to the attention network to calculate the attention score\({a}_{ij}\). These node vectors are updated with each GAT-SAT iteration. The input to our model is just the adjacency matrix M of any number of literals and clauses, and it may be trained and evaluated for problems of any size.

3.2.1 Messaging Passing

In the SAT instance, satisfiability is independent of the name of clause and literal. To use this property, we initialize each clause \({\mathrm{C}}_{i}\) initialized as a vector \({C}_{init}\)\({R}^{d}\), and all clause feature vectors are represented by the matrix C(t) and each literal \({l}_{i}\) as another vector \({l}_{init}\)\({R}^{2n\times d}\). All literal feature vectors are represented by the matrix L(t). For \({\mathrm{L}}_{lstm}\) and \({\mathrm{C}}_{lstm}\), we also have the hidden state \({L}_{h}^{\left(t\right)}\in {R}^{2n\times d}\) and \({C}_{h}^{\left(t\right)}\in {R}^{d}\), where d is the embedding size and d is set to 128 in this paper. \({S}_{wap}\) is a function used to exchange literal and negative literal. Formally, the detailed computation of the message-passing process in the iteration is represented by the following equation:

$$ C^{{\left( {t + 1} \right)}} ,C_{h}^{{\left( {t + 1} \right)}} \leftarrow C_{lstm} \left( {\left[ {C_{h}^{\left( t \right)} ,M^{T} L_{msg} \left( {L^{t} } \right)} \right]} \right), $$
(1)
$$ L^{{\left( {t + 1} \right)}} ,L_{h}^{{\left( {t + 1} \right)}} \leftarrow L_{lstm} \left( {\left[ {L_{h}^{\left( t \right)} ,S_{wap} \left( {L^{\left( t \right)} } \right),MC_{msg} \left( {C^{t + 1} } \right)} \right]} \right). $$
(2)

3.2.2 Graph Attention Mechanism

After the message passing, each node contains not only its own feature information, but also that of its multi-order neighbor nodes. We calculate the attention scores of literal nodes to clause nodes and clause nodes to literal nodes, respectively, and then use the attention scores to calculate the attention weights of clause nodes to literal nodes and literal nodes to clause nodes, as well as the weighted representation of each node. In addition, we calculate the attention scores of each node for all nodes, but only update the attention weights of each node for its neighbor nodes. Formally, at the end of the message passes, the detailed single-layer graph attention calculation procedure is shown below:

$$ e_{ij} = Leaky{\text{Re}} lu(\vec{a}^{T} [W\vec{h}_{i} || W\vec{h}_{j} ]), $$
(3)
$$ a_{ij} = soft\max \left( {e_{ij} } \right) = \frac{{\exp \left( {e_{ij} } \right)}}{{\mathop \sum \nolimits_{{k \in N_{i} }} \exp \left( {\left( {e_{ik} } \right)} \right)}}, $$
(4)
$$ \overrightarrow {{h^{\prime}_{i} }} = \sigma \left( {\Sigma_{j2Ni} \left( {a_{ij} W\vec{h}_{j} } \right)} \right), $$
(5)

where \({\overrightarrow{h}}_{i},{\overrightarrow{h}}_{j}\) are representations of literal nodes and clause nodes, respectively, and \({\overrightarrow{a}}^{T}\) and \({\varvec{W}}\) are the projection matrices. \({\varvec{W}}\) is trainable, \(\upsigma \) is an elu activation function, and \(||\) is a splicing symbol. To make the factors of different nodes easy to compare, we use the softmax function to homogenize the nodes \({\overrightarrow{h}}_{j}\) for node \({\overrightarrow{h}}_{i}\) attention weights of \({a}_{ij}\) and later use it to update the node \({\overrightarrow{h}}_{i}\)'s feature representation. \({\overrightarrow{h}}_{i}^{\mathrm{^{\prime}}}\) as the output of a single attention layer.

The input of our layer is a set of node features. \({\overrightarrow{h}}_{i}\in {R}^{2n*d},{\overrightarrow{h}}_{j}\in {R}^{m*d}\) are representations of literal nodes and clause nodes, where n is the number of literal nodes, m is the number of clause nodes, and d is the number of features in each node. \({e}_{ij}\) indicates the importance of node j’s features to node i as its output. To obtain sufficient expressive power to transform the input features into higher-level features, at least one learnable linear transformation is required. To that end, as an initial step, a shared linear transformation, parametrized by a weight matrix, W ∈ \({R}^{d*d}\), is applied to every node.\(||\) is a concatenate symbol,\( {\varvec{W}}{\overrightarrow{h}}_{i}|| {\varvec{W}}{\overrightarrow{h}}_{j}\in {\mathrm{R}}^{2n*m*2d}\). We then perform self-attention on the nodes—a shared attentional mechanism \({\overrightarrow{a}}^{T}:{{\mathrm{R}}^{2n*m*2d}\times \mathrm{R}}^{2d*1}\to {R}^{2n*m}\) that computes attention coefficients, as shown in Fig. 3.

Fig. 3
figure 3

Single-layer attention score calculation process

It is difficult to capture all the feature information of neighboring nodes by calculating the attention score only once. To stabilize the learning process of the attention mechanism, we use a multi-headed attention mechanism by executing a single-layer attention network K times and concatenation the results of each output as the input of the final output layer, as shown in Eq. (6). To obtain the final representation of the literal node \({\overrightarrow{h}}_{i}^{\boldsymbol{^{\prime}}\boldsymbol{^{\prime}}}\), we compute the attention scores of the spliced feature vectors \({a}_{ij}^{^{\prime}}\) on the graph attention network's final output layer. The computation process of the multi-head graph attention network is shown as follows.

$$ \overrightarrow {h}_{i}^{^{\prime}} = \,||_{(K = 1)}^{K} \sigma \left( {\sum\nolimits_{j \in Ni} {\left( {a_{ij}^{k} W^{k} \overrightarrow {h}_{j} } \right)} } \right), $$
(6)
$$ \overrightarrow {h}_{i}^{^{\prime\prime}} = \sigma \left( {\sum\nolimits_{j \in Ni} {\left( {a_{ij}^{^{\prime}} W^{k} \overrightarrow {h}_{j}^{^{\prime}} } \right)} } \right). $$
(7)

After learning the attention scores, the node vectors contain weighted structural information. We apply a multilayer perceptron of hidden size 128 \({L}_{Vote}\) applied to the feature vector of each variable as a way to extract categorical information to obtain the votes for each literal, and then we compute the mean y of the literal votes to obtain the final prediction. Our model is trained by minimizing the cross-entropy loss of y with respect to the true label φ(P).

4 Datasets

To predict the satisfiability of the problem, we generated two different stochastic SAT problems. Uniformly distributed stochastic 3-SAT problems are simple to generate, but challenging to solve in practice, taking only milliseconds to solve for instances of 100 variables using MiniSAT [28], and sometimes a minute to solve for instances of 300 variables, and several hours for more difficult instances of 600 variables. The running time for solving 3-SAT stochastic problems increases exponentially with the size of the problem [29]. With a fixed number of variables, the probability that a randomly generated formula can be satisfied approaches 100% when the number of clauses decreases (most problems are under-constrained), and it approaches 0% when the number of clauses increases (most problems are over-constrained). For intermediate numbers of variables, this probability does not change gradually, but rather undergoes a sharp phase change at a critical point (where the ratio of clauses to variables is about 4.26) where the probability that the formula is satisfied is exactly 50%. To be able to obtain the dataset we needed quickly, we created five datasets at that critical point, each with a fixed number of variables and instances, with the number of variables being 100, 150, 200, 250, and 300. Each dataset contains 10,000 instances, and we randomly divided the problems of each dataset into training, validation, and test sets in the ratio of 8:1:1, with the number of satisfying instances and unsatisfying instances divided by half.

In the second, we generated small random SAT problems, which did not follow the same pattern as 3-SAT. They contain an unfixed number of literals per clause, ranging from as few as two to as many as seven. The number of clauses per problem is similarly not fixed, ranging from a few dozen clauses at times to more than 200 clauses at times. To facilitate predicting satisfiability, we need to generate a large number of instances for training, and we created two datasets, SR(3, 10) and SR(10, 40), using the same method as Selsam et al. [9], each with a training set of 300,000, a validation set of 30,000, and a test set of 30,000, in which the instances all appear in pairs, one satisfying and one unsatisfying. The distinction between satisfied and unsatisfied instances is that only one literal in a clause is taken backward. SR(3, 10) represents the smallest instance with 3 variables and the largest with 10 variables, while SR(10, 40) has the smallest instance with 10 variables and the largest with 40 variables. They are trivial for modern solvers, but for machine learning classification problems, they are not simple because SAT problems are highly structured and changing a single variable can easily change the formulation from satisfiable to unsatisfiable, so our models must correctly identify and capture the architectural features of invariance. We also built the dataset \({SR}_{5000}\) (3, 10) to evaluate the model’s capacity to acquire satisfiability features with extremely few samples, using just 5000 instances as the training set, 3000 instances as the validation set, and 3000 instances as the test set.

We detail how the distribution SR(N) is created. To generate random clauses with n variables, SR(N) first samples a small integer k (with a mean value slightly above 4), then samples k variables uniformly at random, and finally inverts each variable with 50% independent probability. It generates clauses in this way \({C}_{i}\), adds them to the SAT problem instance, and then queries a traditional SAT solver (we use MiniSAT [28]) until the addition of \({C}_{m}\) clauses makes the problem unsatisfiable. Up to this point \(\left\{{C}_{1},\dots ,{C}_{m-1}\right\}\) is satisfiable, then we invert the \({C}_{m}\) of the individual literal so that a satisfiable problem can be generated \( \left\{{C}_{1},\dots ,{C}_{m-1},{C}_{m}^{^{\prime}}\right\}\). \(\left\{{C}_{1},\dots ,{C}_{m-1},{C}_{m}^{^{\prime}}\right\}\) and \(\left\{{C}_{1},\dots ,{C}_{m-1},{C}_{m}\right\}\) constitute a random SAT problem that occurs in pairs in the distribution SR(N).

5 Experiment and Results

5.1 Experimental Setup

In the messaging part, we use the hyperparameter setting of Selsam et al. [9]: the hidden cells and the literal and clause embeddings have a dimension of 128, each MLP has three hidden layers and a linear output layer, and we pass the parameters of \({\mathcal{l}}_{2}\) norm for regularization, and the scaling factor of the parameters is \({10}^{-10}\). For each problem, we perform 26 message-passing iterations. At the end of the message-passing process, we use two K = 8 multi-headed attention layers, one to calculate the attention scores of literal nodes for clause nodes and the other to compute the attention scores of clause nodes for literal nodes. Every single layer of the attention network has input and output features of dimension 128. it is worth noting that the last layer output layer has an input feature dimension of 1024 (8*128) and an output feature dimension of 128, and we trained our model using the ADAM optimizer [31]. We use binary cross entropy (BCE) as the loss function.

5.2 Experimental Results

5.2.1 SR(N)

We compared GAT-SAT with the advanced neural embedding framework NeuroSAT [9], as shown in Table 1. Bold values indicate the best performance. First trained and tested at SR(3, 10), our model and NeuroSAT used the same dataset instead of randomly generating it once more, and ran with only one epoch on all data. NeuroSAT and GAT-SAT can achieve good accuracy by executing only one epoch, but for the SR(10, 40) problem, we take the final result of executing 50 epochs. NeuroSAT achieved a training accuracy of 0.786 and a testing accuracy of 0.830, while GAT-SAT achieved a training accuracy of 0.879 and a testing accuracy of 0.891. We perform experiments on the more difficult SR(10, 40) to demonstrate that our model is not limited to simple instances. In a previous study, NeruoSAT trained nearly 10 million problems on this dataset and achieved a prediction accuracy of 0.85 on SR(40) [9], while a training set of only 300,000 is not an easy task for NeruoSAT. As shown in Fig. 4, the accuracy of GAT-SAT on SR(10, 40) reached 0.802 in one iteration and finally converged to 0.897 in 29 iterations, while NeuroSAT did not show an inflection point on this dataset, even though we trained for more than 50 iterations and the training time was more than 1 week. For the simple but much less instantiated \({\mathrm{SR}}_{5000}\) (3, 10) dataset. As shown in Fig. 5, after several experiments, and each time taking the results of the first 50 epochs of the validation set, we find that the performance of NeuroSAT is not stable. We select the performance of GAT-SAT when NeuroSAT performs better (the right panel) and worse (the left panel) as a comparison. It can be seen that NeuroSAT accuracy improves rapidly after sufficient message passing and then converges to a higher level, while GAT-SAT almost always completes the process in just a few iterations. Our model can almost always start to converge at single-digit epochs during training and converge quickly to a satisfactory accuracy. At the same time, NeuroSAT takes a dozen epochs to start converging in the best case and 30 epochs in the worst case before it starts to converge.

Table 1 Experimental results on the SR(3, 10) dataset
Fig. 4
figure 4

SR(10, 40) comparison of results of each epoch

Fig. 5
figure 5

The performance of GAT-SAT when NeuroSAT performs better (the right panel) and worse (the left panel) as a comparison

5.2.2 Random 3-SAT

On the five datasets of the random 3-SAT problem, we evaluated the prediction accuracy using GAT-SAT and NeuroSAT. We observed that the prediction accuracy does not decrease as the problem size increases. We present the training accuracy, validation accuracy, and test accuracy of NeuroSAT and GAT-SAT on five random 3-SAT datasets, respectively. We compare the test accuracies of the two models and bold the better results. As shown in Table 2, on the dataset with 100 variables, NeuroSAT obtained a test accuracy of 0.745 and GAT-SAT had a test accuracy of 0.784, while on the dataset with 300 variables, NeuroSAT had an accuracy of 0.799 and GAT-SAT outperformed it by 2%, and on all test sets, GAT-SAT prediction accuracy improved by 1–4% over NeuroSAT. We also note that the difference in results between NeuroSAT and GAT-SAT on the validation set is not significant, but on the test set, NeuroSAT lowers the validation set almost every time, while GAT-SAT shows comparable results to the validation set. As shown in Fig. 6, we show the validation accuracy performance of GAT-SAT and NeuroSAT within 50 epochs on the datasets with 100, 150, 200, 250 and 300 variables. In the first few epochs of the three datasets, GAT-SAT significantly outperforms NeuroSAT. Still, as the number of training sessions increases, the gap narrows significantly, and NeuroSAT outperforms GAT-SAT in the datasets with 200 and 300 variables, gaining a slight advantage. To verify that the model can capture the universal structural features of a given SAT instance, rather than simply learning features of fixed size, we used the model with the highest accuracy on the validation set with 100 variables to predict the satisfiability of other large-scale problems. The results are shown in Table 3, where NeuroSAT and GAT-SAT still perform well on other datasets even when trained only on the dataset with 100 variables. GAT-SAT still has a prediction accuracy 1–4% higher than NeuroSAT.

Table 2 Experimental results on the random 3-SAT problem dataset
Fig. 6
figure 6

Validation accuracy of various models on 100, 150, 200, 250 and 300 variables dataset

Table 3 Experimental results on the trained on 100 variables and tested on the other scales

5.3 Impact of Attention Layer

The traditional graph attention mechanism has an obvious drawback for the SAT problem: it works poorly for multi-order neighbor nodes. We tackle this problem by performing a message-passing step before computing the attention mechanism so that each node contains information about its multi-order neighbor nodes. To show more clearly what our attention layer does after message passing, we visualize the attention fraction of literal nodes for neighboring clause nodes on a very small problem with only 5 literals variables and 35 clauses. As shown in Fig. 7, the horizontal coordinates of the heat map are the literal nodes. For the convenience of representation, the first five rows represent the positive literal of these five literal variables, the last five rows represent the negative literal of these five literal variables, and the vertical coordinates represent the clause nodes. The color of their intersection points represents the importance of the adjacent clauses for the literal itself. The darker the color, the more important it is, whereas the opposite is less important. Each SAT problem instance we generate is assigned different weights to each node after learning the attention scores, and these weights will guide the final voting of these nodes. In fact, for any SAT problem, whether large or small, after the Boolean constraint propagation process (BCP), those variables whose assignments are obtained by inferring from the assignments of other variables are less important. Only a few variables are affecting the satisfiability of the whole SAT problem. If they are all correctly assigned, then the SAT problem is satisfiable, and such a set of solutions is found, and vice versa, the problem is unsatisfiable. We speculate that GAT-SAT achieves a faster and more accurate prediction of satisfiability for each SAT problem precisely by learning the attention scores among nodes and assigning higher weights to the essential literal nodes.

Fig. 7
figure 7

The attention fraction visualization of literal nodes for neighboring clause nodes on a small problem

5.4 Ablation Study

In the previous subsection, we validated the effectiveness of our proposed model. The graph attention network acts as a feature extraction in our model. To verify the effectiveness of our model in extracting features, we set up three sets of experiments, changing only the part of the model feature extraction. The experimental results are shown in Table 4. The first group is the traditional graph attention network as a feature extractor, the second group uses only the message-passing network, and the third group is our proposed GAT-SAT. We train 50 epochs on two datasets, SR(3, 10) and random 3-SAT (number of variables is 100). Experiments take the results of the test set. As we thought, the traditional graph attention network, which only aggregates the information of first-order neighbor nodes, ultimately does not find a suitable method to solve the problem. The message-passing model learns to solve the problem within the dataset and achieves good accuracy. The highest accuracy was achieved by our GAT-SAT method, which improved 5.9–9.2% over the message-passing model. The experiments demonstrate that our improved graph attention network plays an important role in improving the prediction accuracy of our model, especially for difficult problems.

Table 4 Results of the ablation study

6 Conclusions and Future Work

Graph attention networks have performed well on various tasks in recent years. In this paper, we use graph attention networks for the first time to solve propositional logic problems and propose GAT-SAT, a new framework for predicting the satisfiability of random SAT problems. Our model consistently outperforms NeuroSAT in prediction accuracy on both random SAT problems and random 3-SAT problem datasets. We visualize how GAT-SAT does this by giving different weights to the embeddings of literal and clause nodes after the graph attention network learns the attention scores. We concentrate on the feature embeddings of important nodes when predicting satisfiability. Our work shows that the graph attention network can learn and capture the invariant structural features of key nodes of propositional logic problems. GAT-SAT can achieve higher accuracy with only half the epoch of the message-passing model NeuroSAT in the same absence of artificial features and with only similar supervision. In subsequent work, we will try to use graph attention networks to solve some more practical problems.