1 Introduction

The process of discovering and develo** drugs is a challenging and expensive field that requires a lot of time and resources. To reduce the computational resources and errors of screening potential compounds, artificial intelligence, especially deep learning models, have been widely used in drug discovery and development. These models can help find promising molecular candidates faster and better, particularly in virtual screening (VS), which is a key step in drug discovery and development [1,2,3,21].

The present study proposes a deep learning architecture called GMPP-NN that integrates the MPNN and a machine learning model called MLP Classifier to predict molecular properties. This architecture was carried out on the four datasets (HIV, BACE, BBBP, and clintox) from moleculenet (a comprehensive benchmark platform for molecular deep learning and machine learning).

Our work makes several key contributions in the field of graph-based deep learning, particularly for molecular property prediction problems in drug discovery and development. We propose the GMPP-NN architecture, which is designed to be flexible and applicable to a wide range of problems that use graph data as input. The experimental results indicate that GMPP-NN can achieve competitive performance when compared to other advanced methods in the field of drug discovery and development. Additionally, we show that GMPP-NN is a flexible architecture that can be modified to use different variants of GNN, such as graph convolutional networks (GCN), and can also be used for regression problems by changing the classification model used in the final stage. These contributions demonstrate the potential of GMPP-NN as a powerful architecture for solving graph-based problems and show its effectiveness for classification problems.

In the next section, we will detail the experimental setups of our datasets and their respective properties. Next, we will give a description of the MPNN model for molecular embedding and our GMPP-NN architecture, followed by a discussion of the architecture’s performance results compared to the existing methods.

2 Materials and methods

2.1 Define features

The bond and atom properties have been initialized and are shown in Tables 1 and 2, respectively. Only the atom features of the first node in the MPNN constitute its initial features. However, the bond features of bond vw are the first features of the MPNN’s first edge. The RDKit package is used to calculate all features [22].

Table 1 Atom Features
Table 2 Bond Features

2.2 Data collection

In this research study, our datasets come from the benchmark dataset MoleculeNet. For the classification challenge, four datasets (HIV, BACE, BBBP, and ClinTox) were used, including two domains (physiology and biophysics). A global dataset was divided into a training dataset consisting of 75%, a validation dataset comprising 20%, and a test dataset containing 5%. The model underwent training using the training dataset, and its hyperparameters were adjusted and optimized based on the validation dataset’s results. Ultimately, the model’s performance was evaluated using the test dataset [23].

The drug therapeutics program AIDS antiviral screen presented an HIV dataset, testing approximately 40,000 compounds to block HIV replication. The results were classified into two categories: inactive compounds that were confirmed and active compounds that were confirmed [33].

A BACE dataset is available, which includes both quantitative and qualitative data on the binding abilities of various human beta-secretase 1 inhibitors. The dataset includes experimental values published in scientific literature. It consists of 1522 chemical compounds, with their binary classifications focusing on a unique protein target for classification tasks. [34].

The BBBP dataset (Blood Brain Barrier Penetration) is a research project focused on modeling and predicting barrier permeability, an essential aspect in develo** drugs targeting the central nervous system, where the barrier typically blocks most drugs, neurotransmitters, and hormones. This dataset contains binary classification labels for more than 2000 chemical compounds [35].

The ClinTox dataset differentiates between pharmaceuticals that have received FDA approval and compounds that have experienced failure in clinical trial stages due to toxicity-related problems. The dataset consists of 1491 pharmacological compounds and includes two classification tasks: predicting clinical trial toxicity and determining FDA approval status. The dataset is obtained from the SWEETLEAD database, presenting significant insights into the distinctions between successful and unsuccessful drug candidates [36].

In all four datasets, the SMILES dataset is used to represent molecules. the classification problems are resolved by the proposed architecture, allowing for molecular properties across the datasets described in Table 3.

Table 3 Summary of the DataSet

2.3 Generate graph

We employed a multi-step process to generate complete graphs from the SMILES representations of chemical compounds, as shown in Fig. 1.

Fig. 1
figure 1

Schema of converting SMILES dataset to graph

The process of creating a molecular graph consists of two sequential steps. In the initial step, a SMILES string is taken as input, and the result is the creation of a molecule object. Subsequently, in the second step, the generated molecule object serves as input for the creation of a graph, which includes atom features, bond features, and pair indices. This two-step process is fundamental for converting a chemical structure represented by a SMILES string into a structured molecular graph, facilitating further computational analysis and modeling.

Using attributed molecular graphs, a wide range of deep learning techniques can be applied to learn molecular structures and extract useful information. The ability of models to capture crucial properties of atoms and bonds within molecules is made feasible by these features, which help tasks such as property prediction, reaction prediction, and drug design [30].

2.4 Model architecture

Molecular graph embedding is an important area of study in graph neural networks. Molecular graph embedding typically involves three main components: atom-level embedding (which represents atoms in a graph as vectors), bond-level embedding (which represents bonds in a graph as vectors), and molecule-level embedding (which represents the whole molecule as a vector). In this study, we examine the concept of graph embedding, specifically referring to it as molecule-level embedding. This technique involves generating a vector representation for a molecule, which can then be used as input for our GMPP-NN architecture. The model, based on the architecture, is a message-passing network for generating a graph representation vector [39].

2.4.1 Message passing network

MPNN is a neural network model designed for graph data, with differing approaches to processing both undirected graph and directed graph structure and applications. To keep things simple, we will talk about MPNN, which works on molecular graph data g that has both node feature \(X_V\) and edge feature \(e_{VW}\). As depicted in the Fig. 2, a graph is a molecule that contains atoms as nodes and bonds as edges.

Fig. 2
figure 2

Message passing neural network (MPNN) for molecular embedding

The forward pass has two different phases: the message-passing phase and the readout phase. The message passing phase is characterized by the message functions \(M_t\) and vertex update functions \(U_t\), and it covers a period of T time steps. During the message passing phase, the hidden states \(h_V^{t}\) at each atom node in the network are updated using messages \(m_V^{t+1}\) in the following way:

$$\begin{aligned} m_V^{t+1}&= \sum \limits _{W \in N(V)} M_t\left(h_V^t, h_W^t, e_{VW}\right) \end{aligned}$$
(1)
$$\begin{aligned} h_V^{t+1}&= U_t\left( h_V^t, m_V^{t+1}\right) \end{aligned}$$
(2)

where \(h_{V}^{0}\) is a function of the initial atom features \(x_{V}\), and N(V) is the collection of neighbors of V in graph G.

A graph readout phase:

$$\begin{aligned} g=R\left(H^{L}, H^{0}\right) \end{aligned}$$
(3)

The graph embedding is represented as g. The graph readout function R combines and transforms the initial and final atom node states to provide a unique graph embedding.

2.4.2 GMPP-NN architecture

A lot of different deep learning architectures have been suggested and used to guess. As many deep learning architectures have been proposed and used to predict molecular properties, such as ST (SMILES Transformer), which uses pre-training and a text representation system for molecules to improve performance, especially with small datasets [31], FP2VEC (Fingerprint to Vector) discusses the development of a method that uses deep learning technology to improve the accuracy of predicting the properties of chemical compounds. and FP2VEC, The authors demonstrate that when combined with a convolutional neural network model, achieves competitive results in various tasks related to the quantitative structure-activity relationship (QSAR), particularly in classification tasks [32]. Deeper graph neural network (Deeper-GCN) which can stack very deep layers, GCNs face challenges such as vanishing gradients, over-smoothing, and over-fitting when going deepe, it proposes a novel normalization layer called MsgNorm and a pre-activation version of residual connections for GCNs [25], Geometry-enhanced molecular (GEM) it designs a geometry-based GNN architecture that simultaneously models atoms, bonds, and bond angles within a molecule. Specifically, it uses double graphs to encode atom-bond relations and bond-angle relations. [26] and atom-bond transformer-based message-passing neural network (ABT-MPNN) combines the strengths of message-passing neural networks (MPNNs) and Transformers. It introduces an innovative architecture that integrates molecular representations at the bond, atom, and molecule levels. By designing attention mechanisms during message-passing and readout phases and captures local and global information effectively [27]. Our architecture is used for the same goal: to predict molecular properties using graph data as molecules and a MPNN model for graph embedding, followed by a MLP classifier for molecular property prediction as illustrated in the Fig. 3.

The molecular dataset, represented by SMILES rows, was formatted into graph structures using the graph_from_smiles method, as explained in Fig. 1. the architecture GMPP-NN, read the graph representations of the molecular features using MPNN, and returned a graph embedding g. After the readout phase, the graph structures were loaded as the sample partitioned feature vectors into the MLP classifier (multi-layer perceptron classifier) to predict different molecular properties.

Our architecture involves transforming SMILES datasets into graph datasets as the first stage. We create a batch of sub-graphs (disconnected graphs), where each sub-graph represents a single molecule, and the MPNN model uses the disconnected graph as input to generate the disconnected graph embedding (global feature vector). After that, we partition the global feature vector into sub-vector features that are used for prediction in the MLP classifier.

Fig. 3
figure 3

The proposed GMPP-NN architecture

2.4.3 Outputs insights from diverse datasets

The output of our architecture comes from the last model of our GMPP-NN architecture (the MLP Classifier model), which contains the following predictions:

  • The BBBP dataset is typically used to predict whether a chemical compound can penetrate the blood-brain barrier (BBB). The BBB serves as a protective interface, maintaining a barrier between the central nervous system and the circulatory system. The prediction of BBBP penetration is essential in drug discovery to determine if a compound can reach the brain to treat neurological conditions.

  • The Clintox dataset is used for predicting the clinical toxicity of chemical compounds. It helps determine if a compound is likely to have adverse effects on human health, making it crucial for drug safety assessment and regulatory purposes.

  • The HIV dataset is used to predict whether a peptide sequence is susceptible to cleavage by the HIV-1 protease enzyme. Understanding this property is vital to the development of antiretroviral drugs to combat HIV/AIDS.

  • The BACE dataset is used for predicting the inhibition of the beta-secretase 1 (BACE-1) enzyme. Inhibition of BACE-1 is a target for the development of drugs to treat Alzheimer’s disease, as BACE-1 is involved in the production of beta-amyloid peptides, which play a role in the disease.

2.4.4 Performance metrics

The AUC (area under curve) metric evaluates a classifier’s performance by classifying positive and negative samples. Positive samples can be classified as true positives or false negatives, while negative samples can be classified as true negatives or false positives. Refer to true negatives (TN), false positives (FP), true positives (TP), and false negatives (FN). The GMPP-NN performance of our architecture was evaluated using ROC-Curve (Receiver Operating Characteristic Curve) and PRC-Curve (Precision-Recall Curve) metrics [40].

The ROC curve evaluates the distinction between a true positive rate (TPR) and a false positive rate (FPR). It quantifies the classifier’s ability to distinguish between positive and negative datasets across a spectrum of classification needs. A higher AUC score, which ranges from 0 to 1, points to better classifier performance. The area under the ROC curve that measures the variation between the TPR and FPR is used to calculate the AUC. Specifically, TPR can be defined as:

$$\begin{aligned} \text {TPR} = \frac{\text {TP}}{\text {TP + FN}} \end{aligned}$$
(4)

while FPR is given by:

$$\begin{aligned} \text {FPR} = \frac{\text {FP}}{\text {FP + TN}} \end{aligned}$$
(5)

These metrics offer insights into the model’s ability to differentiate positive and negative instances accurately.

PRC-Curve is an alternative performance measure for binary classifiers that assesses the balance between precision and recall. precision represents the proportion of accurate positive predictions among all positive predictions, while recall signifies the fraction of accurate positive predictions among all genuine positive samples.

The PRC- curve depicts precision as a function of recall, with the AUC calculated as the area beneath this curve. A higher PRC- curve suggests superior classifier performance when considering precision and recall. PRC-Curve is especially beneficial when addressing unbalanced datasets, where the number of negative instances greatly surpasses the number of positives. The true positive rate (TPR) can be determined as follows:

$$\begin{aligned} \text {TPR} = \frac{\text {TP}}{\text {TP + FN}} \end{aligned}$$
(6)

and the true positive fraction (TPF) as:

$$\begin{aligned} \text {TPF} = \frac{\text {TP}}{\text {TP + FP}} \end{aligned}$$
(7)

These metrics quantify the relationship between true positive predictions, false negatives, and false positives, providing valuable insights into the model’s performance.

3 Results and discussion

3.1 Model training and validation performance

The GMPP-NN (Graph Molecular Property Prediction Neural Network) model training experiment on four different datasets (BBBP, HIV, BACE, and ClinTox) illustrates promising results. The training progressed gradually for each dataset, showing progressive improvements in both ROC-Curve and PRC-Curve, as shown in Figs. 4 and 5.

3.1.1 ROC-curve training and validation performance

Fig. 4
figure 4

ROC-curve training and validation performance

After 40 epochs, the GMPP-NN model demonstrated strong discriminative power in predicting molecular properties in various datasets. In the BBBP dataset, the model achieved a remarkable training ROC-Curve of 0.9577 and a validation ROC-Curve of 0.9285, indicating its ability to distinguish between molecules with blood-brain barrier penetration and those without. In the HIV dataset, the model achieved a training ROC-Curve of 0.8413 and a validation ROC-Curve of 0.8175, suggesting its ability to predict HIV-related molecular properties. In the Clintox dataset, the model achieved a training ROC-Curve of 0.9158 and a validation ROC-Curve of 0.8280, indicating its strong discriminative power in classifying toxicological properties. Finally, in the BACE dataset, the model achieved a training ROC-Curve of 0.8917 and a validation ROC-Curve of 0.8599, demonstrating its ability to predict molecular properties related to BACE inhibition. The GMPP-NN architecture is a versatile tool that excels in molecular property prediction tasks due to its consistent performance improvement during training and validation ROC-Curve scores, making it valuable for diverse task prediction.

3.1.2 PRC-Curve training and validation performance

Fig. 5
figure 5

PRC-curve training and validation performance

After 40 epochs, the GMPP-NN model demonstrated impressive performance in different datasets, including the BBBP dataset, HIV dataset, Clintox dataset, and BACE dataset. It achieved a training PRC-Curve of 0.9516 and a validation PRC-Curve of 0.8942, indicating its proficiency in distinguishing between molecules with and without blood-brain barrier penetration. The model also showed strong performance in HIV, with a training PRC-Curve of 0.8184 and a validation PRC-Curve of 0.7976. It excelled at classifying toxicological properties with high precision and recall. In the BACE dataset, the model achieved a training AUC of 0.8590 and a validation PRC-Curve of 0.8527, indicating its ability to predict molecular properties related to BACE inhibition with a consistent balance between precision and recall. The PRC-Curve scores in various datasets demonstrate its versatility and reliability in molecular property prediction tasks. It provides precise predictions and effectively identifies relevant instances, making it a valuable tool for different tasks.

Comparing the ROC-Curve and PRC-Curve metrics, we observe that both metrics exhibited similar trends over the training epochs, as demonstrated by the consistent improvement in performance during training and validation. The ROC-Curve reveals how effectively the model separates positive and negative instances across various decision thresholds, while the PRC-Curve focuses on precision and recall, which is particularly important when dealing with imbalanced datasets.

3.2 ROC-Curve and PRC-Curve metrics prediction performance

The evaluation and comparison of ROC-Curve and PRC-Curve metrics for the GMPP-NN model performance on four datasets (BACE, BBBP, clintox, and HIV) are presented in Fig. 6.

Fig. 6
figure 6

Prediction performances of the four datasets using both ROC-curve and PRC-curve metrics

The model’s performance on different datasets was evaluated using both ROC-Curve and PRC-Curve measures. The HIV dataset displayed outstanding performance, with an ROC- curve of 0.8677 and a slightly lower PRC- curve of 0.8565. The Clintox dataset achieved exceptional results with an ROC- curve of 0.9795 and an impressive PRC- curve of 0.9257, indicating excellent discrimination between positive and negative instances. The BBBP dataset showed outstanding performance with an ROC- curve of 0.9186 and a somewhat lower PRC- curve of 0.8757, suggesting good detection of positive instances while maintaining reasonable specificity. The BACE dataset performed well, with an ROC- curve of 0.8608 and a slightly higher PRC- curve of 0.8615.

For the HIV dataset, both the ROC- curve and the PRC- curve are suitable choices for evaluating performance. The ROC- curve is potentially better for the Clintox dataset, while the PRC- curve may be preferable for the BBBP dataset. In the case of the BACE dataset, both the ROC- curve and the PRC curve are effective measures of performance.

Therefore, for the HIV dataset, both ROC-Curve and PRC-Curve are good measures of performance. For the Clintox dataset, ROC-Curve may be a better measure of performance, while for the BBBP dataset, PRC-Curve may be a better measure of performance. For the BACE dataset, both ROC-Curve and PRC-Curve are good measures of performance.

In the comparative research of GMPP-NN performance with the other studies, we utilized the ROC-Curve measure as a metric all dataset tasks.

3.3 Comparative analysis of GMPP-NN performance with three studies

We tested our architecture on four different task datasets (HIV, BACE, BBBP, and ClinTox) and evaluated the ROC-Curve score on the test set of each dataset. We compare our results to those of three previous studies.

Table 4 The performances over all datasets

The performance comparison of the GMPP-NN model study with the other studies: SMILES transformer (ST), FP2VEC (fingerprint to vector), Deeper graph neural network (Deeper-GCN), Geometry-enhanced molecular (GEM) and atom-bond transformer-based message-passing neural network (ABT-MPNN) using the ROC-curve metric is presented in Table 4. The findings showed that on three of the four datasets, GMPP-NN produced the highest ROC-Curve values. In the HIV dataset, GMPP-NN achieved an ROC-curve of 0.8677, while ST, FP2VEC, Deeper-GCN, ABT-MPNN and GEM achieved ROC-curve values of 0.683, 0.785, 0.789, 0.809 and 0.769 respectively. In the BACE dataset, GMPP-NN obtained an ROC curve of 0.8608, whereas ST, FP2VEC, and GEM achieved ROC curve values of 0.719, 0.883, and 0.856, respectively. In the BBBP dataset, GMPP-NN achieved an ROC-curve of 0.9186, whereas ST, FP2VEC, and GEM obtained ROC-curve values of 0.900, 0.911, and 0.724, respectively. Lastly, in the ClinTox dataset, GMPP-NN achieved the highest ROC-Curve of 0.9795, while ST, FP2VEC, Deeper-GCN, ABT-MPNN and GEM achieved ROC-Curve values of 0.963, 0.803, 0.870, 0.904 and 0.825 respectively. These findings suggest that GMPP-NN Based on the MPNN model and molecular graph as featurization method is a promising architecture for molecular property prediction, as it achieved the highest performance on the majority of the datasets evaluated. However, the choice of the featurization method and embedding model can considerably influence the architecture’s performance. Different featurization techniques and models can have varying impacts on predicting the desired outcomes, requiring careful evaluation.

4 Conclusion

We created and applied an architecture, GMPP-NN (graph molecular property prediction), to the chemical drug classification tasks that showed outstanding performances in the majority of the cases. As far as we are concerned, the work demonstrates how the MPNN-based model of the architecture can extract a variety of chemical properties from molecular graphs. The performance of our model was top-notch. We executed multiple tests on four binary classification benchmark datasets that were labeled with physiological and biophysical features. In detail, we use a molecular graph constructed from the SMILES dataset used as input to the MPNN model to obtain the graph embedding and a MLP classifier (multilayer perceptron classifier) model for the binary classification. We evaluate our study on two metrics, ROC AUC and PRC AUC, and finally compare the GMPP-NN to the other model using the ROC AUC metric. We have found that our architecture performs better than the SMILES transformer (ST), FP2VEC (fingerprint to vector), Deeper graph neural network (Deeper-GCN), Geometry-enhanced molecular (GEM) and atom-bond transformer-based message-passing neural network (ABT-MPNN). We anticipate that our findings will serve as a new reference for molecular property prediction in drug discovery process, specifically in the classification task. and our model help avoid investing resources in molecules with unfavorable properties, reducing the number of experimental failures, and contribute to better understanding and profiling of drug safety, hel** to identify potential risks early in the discovery and development process.