1 Introduction

Smart Manufacturing represents a revolutionary shift in industrial production, leveraging advanced technologies such as the Internet of Things (IoT), artificial intelligence (AI), and big data analytics to create highly automated and interconnected production environments. This integration allows for real-time data acquisition and analysis, essential for optimizing production lines, predicting maintenance needs, and ensuring quality control. Dynamic Graph Neural Networks (DGNNs) enhance these capabilities by learning from the temporal dynamics within the manufacturing environment. Unlike static models, DGNNs adapt to ongoing changes, providing more accurate predictions. For example, by modeling inter-component relationships within a production line as a dynamic graph, DGNNs can foresee potential failures or bottlenecks. This predictive insight enables preemptive actions to minimize downtime and boost productivity, thus maintaining continuous and efficient operations. By adopting DGNNs, Smart Manufacturing not only becomes more efficient but also shifts towards proactive maintenance strategies, reducing the risk of costly unplanned disruptions.

Initially, research on DGNNs primarily built upon the successes achieved in static graph research, which demonstrated notable outcomes in downstream tasks including link prediction [14] designed the EvolveGCN model, which no longer inputs the node features learned by GCN into the RNN. Instead, the GCN parameters of previous step are used as the input of RNN, and the output of RNN is used as the parameters of GCN of the next time step, which effectively reduces the pressure of the dynamic graph model on the hardware equipment and improves the accuracy of the model.

Tensor decomposition is widely used to model high-order variable correlation. Further research by Takeuchi et al. [15] found that tensor decomposition can mine the time change pattern of objects. Inspired by this, Shi et al. [16] designed the GAEN model. This model combines both the Graph Attention Network (GAT) [17] with the Gate Recurrent Unit (GRU) [20] proposed the Jum** Knowledge Network (JK-Net) framework. This architecture allows embeddings from each layer to directly connect to the output layer during the iterative aggregation process. The final output is then generated by aggregating these connections with a specific aggregation function. Each node adaptively selects information from its higher-order neighbors, ensuring that the influence of each node’s neighborhood is distinct, preserving node diversity and, to some extent, mitigating the over-smoothing phenomenon.

Rong et al. [21] proposed an additional model, DropEdge, based on the method of cutting edges. The design idea of the model is relatively simple, just cutting some edges randomly from the original graph. However, cutting edges can make the connection between nodes more sparse, reduce the frequency of node information exchange, and effectively alleviate the over smoothing phenomenon with the stacking of model layers.

In order to effectively mine the information of high-order neighbors, Sun et al. [22] designed the ADAGCN model. Since this model introduces the Adaboost method to integrate the information of each order of neighbors, it can effectively absorb information from different levels in the process of building the depth map model. The knowledge of order neighbors not only improves the prediction accuracy, but also avoids the over-smoothing of the original GCN to a certain extent.

In summary, researchers have proposed various solutions to alleviate over-smoothing in dynamic graph neural networks, such as adaptive information selection, edge cutting, and integration of high-order neighbor information.

3 Problem statement

A salient challenge in crafting dynamic GNN models lies in the intricate task of amalgamating the temporal and spatial dimensions of dynamic graphs. While a naive superposition of graph neural network layers, as applied to static graphs, might yield satisfactory results, this approach fails to account for the potential variations in nodal connections across successive time steps in dynamic environments. For example, in the dynamic collaboration network shown in Fig. 1, nodes represent authors, and edges between nodes represent authors’ collaborative relationships. At time step \(t_{0}\), author A has a cooperative relationship with author B, author B also has a cooperative relationship with author C. At time step \(t_{1}\), author A and author C skip author B for cooperation. While at time step \(t_{2}\), the cooperative relationship between A, B, and C returns to the state at time step \(t_{0}\). The embedding at the previous time \(t_{1}\) may not effectively help calculate the embedding at time step \(t_{2}\). Therefore, we use RNN in processing time series and introduce residual systems to ensure the stability of embedding.

Figure 1
figure 1

Dynamic Collaboration Network

In addition, changes in node characteristics in dynamic networks can affect higher-order neighbors over time. Take, for instance, the traffic speed graph depicted in Fig. 2, where nodes represent traffic speed monitors and edges signify direct road connections between these monitors. At time step \(t_{0}\), a severe traffic accident occurred on the road segment between monitor A and monitor B, resulting in a significant traffic jam (highlighted in red). Consequently, the average speeds recorded by monitors A and B will exhibit abnormal changes; at time step \(t_{1}\), this congestion is likely to propagate along the roadway. Monitors directly connected to A and B will start detecting this traffic jam, and by the subsequent time step \(t_{2}\), the disturbance will spread to adjacent monitors. As time progresses, the neighboring nodes of A and B will be impacted in varying degrees.

Figure 2
figure 2

Dynamic Traffic Situation

In response to these dynamics, our research aims to address the differential impacts exerted by each neighborhood order on node embeddings. Rather than merely stacking traditional Graph Neural Network (GNN) layers, we employ AdaBoost to adaptively integrate information across different neighborhood orders. This approach enables us to effectively combine the temporal and spatial dimensions of the dynamic graph relationships, thereby enhancing the model’s ability to interpret and predict changes in network states caused by incidents such as traffic jams. Below we will introduce the detailed design of each module of BDGNN.

4 Method

4.1 Model overall design

The overarching schematic of the model is illustrated in Fig. 3, encompassing three parts: modified graph convolutional network without activation function, RNN-based processing for temporal data and an analogous RNN architecture to handle higher-order spatial domain information. The model accepts a feature matrix X and a sequence of normalized adjacency matrices for input. Structured with L layers, each is considered an independent learner. The l-th layer utilizes RNNs alongside residual modules to capture the temporal information inherent in the graph sequence, and a bi-layer perceptron \(g_{\theta}\) to extract topological details from the adjacency matrix, represented by \(\boldsymbol{X}=\boldsymbol{\hat{A}}^{l}X\). This produces an output \(\boldsymbol{H}^{(l)}\). Following training, the nodal weights \(w^{l}\) and the neural network parameters \(\theta _{l}\) are propagated to the next layer’s learner, which then embarks on learning from the enhanced adjacency matrix \(\boldsymbol{\hat{A}}^{l+1}\boldsymbol{X}\). After the L-th layer learner has been trained, an AdaBoost algorithm, chosen in accordance with the specific downstream task, integrates the features from all L layers to derive the model’s final output H.

Figure 3
figure 3

Overall Model Architecture

4.2 Modified graph convolutional network without activation function

The proposed model employs modified Graph Convolutional Networks (GCN) to effectively process and analyze the neighborhood information inherent in graph topology. The GCN is established as both the most ubiquitously adopted and fundamental model among graph neural network architectures. It is constructed with multiple strata of graph convolution. Specifically, the l-th layer receives both the adjacency matrix A and the node embedding matrix from the previous layer, \(\boldsymbol{H}^{(l)}\), as inputs. Subsequently, under the action of the weight matrix \(\boldsymbol{W}^{(l)}\), it updates the representation from \(\boldsymbol{H}^{(l)}\) to \(\boldsymbol{H}^{(l+1)}\), which is then propagated as the output. This process is encapsulated by the following mathematical formulation:

$$ \begin{aligned} \boldsymbol{H}^{(l+1)}&=\operatorname{GCONV}\bigl( \boldsymbol{A},\boldsymbol{H}^{(l)},\boldsymbol{W}^{(l)}\bigr) \\ &:=\sigma \bigl(\boldsymbol{\hat{A}}\boldsymbol{H}^{(l)}\boldsymbol{W}^{(l)}\bigr), \end{aligned} $$
(1)

where \(\boldsymbol{\hat{A}}\) represents the normalization of A, which is defined as:

$$ \begin{aligned} &\boldsymbol{\hat{A}}=\boldsymbol{\tilde{D}}^{-\frac{1}{2}}\boldsymbol{ \tilde{A}} \boldsymbol{\tilde{D}}^{-\frac{1}{2}}, \\ &\boldsymbol{\tilde{A}}=\boldsymbol{A}+\boldsymbol{I}, \\ &\boldsymbol{\tilde{D}}=\operatorname{diag}\biggl(\sum_{j} \boldsymbol{\tilde{A}}_{ij}\biggr). \end{aligned} $$
(2)

σ is the activation function (usually ReLU) of all layers except the output layer. The first layer of the model usually treats the node feature matrix X as the node embedding matrix \(\boldsymbol{H}^{(0)}\).

The modified GCN we employed incorporates two specific computational modifications inspired by AdaGCN:

1) The activation function is removed. Since the feature representation of each node is usually a one-dimensional sparse vector, rather than an image, which intuitively requires a deep convolution network to extract high-level representations for visual tasks, the ideal representation of nodes does not necessarily require too many nonlinear transformations. Wu et al. [23] was also driven by this thinking when designing the Simple Graph Convolution (SGC) mode, which removed the nonlinear activation function between GCN layers:

$$ \begin{aligned} \boldsymbol{H}^{(l)}=\boldsymbol{\hat{A}}^{l} \boldsymbol{X}\boldsymbol{W}^{(0)}\boldsymbol{W}^{(1)}\cdots \boldsymbol{W}^{(l)}= \boldsymbol{\hat{A}}^{l}\boldsymbol{X}\boldsymbol{\tilde{W}}. \end{aligned} $$
(3)

2) Similar to SGC, which removes ReLU, the stacked linear transformation from graph convolution is insufficient in integrating high-order neighbor information. Therefore, we use a two-layer fully connected neural network \(g_{\theta}\) to replace the linear transformation W to make up for the lack of nonlinear changes in GCN.

Based on the above two changes, the input-output relationship of the l-th layer GCN can be expressed as:

$$ \boldsymbol{H}^{(l)}_{t}=g_{\theta}\bigl(\boldsymbol{ \hat{A}}_{t}^{l}\boldsymbol{X}\bigr). $$
(4)

4.3 Modified recurrent neural network with residual system

We utilize Recurrent Neural Networks (RNN) to effectively capture and analyze the temporal information present in graph sequences. RNN is a special neural network that focuses on processing sequence data. Classical RNN models include Long Short Term Memory (LSTM) and Gate Recurrent Unit (GRU). LSTM can be expressed as:

$$\begin{aligned} & \boldsymbol{f}_{t}=\sigma \bigl({ \boldsymbol{W}_{f}}\cdot [{\boldsymbol{h}_{t-1}},{\boldsymbol{x}_{t}}]+{ \boldsymbol{b}_{f}}\bigr), \\ & \boldsymbol{i}_{t}=\sigma \bigl({\boldsymbol{W}_{i}}\cdot [{ \boldsymbol{h}_{t-1}},{\boldsymbol{x}_{t}}]+{ \boldsymbol{b}_{i}} \bigr), \\ & \boldsymbol{\widetilde{C}}_{t}=\tanh \bigl({\boldsymbol{W}_{C}}\cdot [{\boldsymbol{h}_{t-1}},{ \boldsymbol{x}_{t}}]+{\boldsymbol{b}_{c}} \bigr), \\ & \boldsymbol{C}_{t}={\boldsymbol{f}_{t}}*{\boldsymbol{C}_{t-1}}+{ \boldsymbol{i}_{t}}* \widetilde{{\boldsymbol{C}_{t}}}, \\ & \boldsymbol{o}_{t}=\sigma \bigl({\boldsymbol{W}_{o}}\cdot [{ \boldsymbol{h}_{t-1}},{\boldsymbol{x}_{t}}]+{ \boldsymbol{b}_{o}} \bigr), \\ & \boldsymbol{h}_{t}={\boldsymbol{o}_{t}}*\tanh ({ \boldsymbol{C}_{t}}), \end{aligned}$$
(5)

where t denotes the discrete time step, \(\boldsymbol{f}_{t}\) symbolizes the forget gate, and \(\boldsymbol{i}_{t}\) denotes the input gate. The cell state is represented by \(\boldsymbol{C}_{t}\), while \(\boldsymbol{o}_{t}\) corresponds to the output gate, and \(\boldsymbol{h}_{t}\) signifies the hidden state. The parameters of the model are encapsulated by W for weights and b for biases. The sigmoid function is employed as the activation function, and the symbol ∗ is used to denote the Hadamard product, which refers to element-wise multiplication within the model’s computations.

GRU can be expressed as:

$$ \begin{aligned} & {\boldsymbol{z}_{t}}=\sigma \bigl({ \boldsymbol{W}_{z}}[{\boldsymbol{h}_{t-1}},{\boldsymbol{x}_{t}}]+{ \boldsymbol{b}_{z}}\bigr), \\ & {\boldsymbol{r}_{t}}=\sigma \bigl({\boldsymbol{W}_{r}}[{ \boldsymbol{h}_{t-1}},{\boldsymbol{x}_{t}}]+{ \boldsymbol{b}_{r}} \bigr), \\ & \widetilde{{\boldsymbol{h}_{t}}}=\tanh \bigl({\boldsymbol{W}_{h}}[ \boldsymbol{r}_{t}*{\boldsymbol{h}_{t-1}},{ \boldsymbol{x}_{t}}]+{ \boldsymbol{b}_{h}}\bigr), \\ & {\boldsymbol{h}_{t}}=(1-{\boldsymbol{z}_{t}})*{\boldsymbol{h}_{t-1}}+{ \boldsymbol{z}_{t}}* \widetilde{{\boldsymbol{h}_{t}}}, \end{aligned} $$
(6)

where t denotes the discrete time step, \(\boldsymbol{z}_{t}\) is the update gate, \(\boldsymbol{r}_{t}\) signifies the reset gate, \(\boldsymbol{\widetilde{h}}_{t}\) represents the candidate hidden state, and \(\boldsymbol{h}_{t}\) is the final hidden state.

In order to avoid the particularity of some time steps affecting the performance of the overall dynamic system, we have added the residual system \(\operatorname{Res}^{(l)}\) in each time step, which is implemented as the network weight before i time steps, and i is usually 3. Similar to EvovleGCN, this paper also uses two versions to update the weight \(\boldsymbol{W}^{(l)}_{t}\) of the two-layer fully connected neural network in the base classifier.

The first version treats \(\boldsymbol{W}^{(l)}_{t}\) as the hidden state in the recurrent architecture, and the input state is the node embedding \(\boldsymbol{H}^{(l)}_{t}\). This version is denoted H, and its representation can be written as:

$$ \begin{aligned} \boldsymbol{W}^{(l)}_{t}= \operatorname{RNN}\bigl(\boldsymbol{H}^{(l)}_{t}, \boldsymbol{W}^{(l)}_{t-1}\bigr)+\operatorname{Res}^{(l)}_{t}. \end{aligned} $$
(7)

The second version treats \(\boldsymbol{W}^{(l)}_{t}\) as the input and output of the recurrent structure in stead of node embedding. This version is denoted O, and its representation can be written as:

$$ \begin{aligned} \boldsymbol{W}^{(l)}_{t}= \operatorname{RNN}\bigl(\boldsymbol{W}^{(l)}_{t-1}\bigr)+ \operatorname{Res}^{(l)}_{t}. \end{aligned} $$
(8)

4.4 Higher-order neighborhood information processor

To seamlessly handle and integrate high-order spatial information, we leverage the AdaBoost algorithm as an adaptive approach to process the neighborhood information. AdaBoost is an important integrated learning technology. Its main idea is to enhance the combination of multiple weak learners with low prediction accuracy to strong learners with high prediction accuracy. In comparison to other boosting methods employed in graph neural networks, AdaBoost offers distinct advantages, including adaptive feature importance, ensemble model diversity, effective error handling, scalability, and versatility. These characteristics collectively render AdaBoost a powerful and advantageous approach.

After calculating the embedding representation of each layer of GCN at the time step t, we use AdaBoost to integrate them to obtain the final embedding, and then select the corresponding algorithm according to the downstream task to calculate the final result.

We use the AdaBoost classification algorithm SAMME.R [24] if the task of the current data set is node classification or link prediction. The process is as shown in Algorithm 1.

Algorithm 1
figure a

BDGNN based on SAMME.R Algorithm

If the task of the current data set is linear regression, the AdaBoost R2 algorithm is used to aggregate the regression prediction of the learner. The process is shown in Algorithm 2.

Algorithm 2
figure b

BDGNN based on AdaBoost R2

5 Experiments

5.1 Data sets

The experimental design involves two different downstream tasks, and the dataset is further divided into two groups based on the properties of each downstream task. The first group consists of four datasets specified for the link prediction task.

Stochastic Block Model. (SBM for short) SBM is a commonly used stochastic graph model to simulate community structure and evolution. We use the data model generated in EvolveGCN.

Bitcoin OTC. (BC-OTC for short) The BC-OTC dataset is the Bitcoin user network. This dataset can be used to predict the polarity of each rating and whether the user will rate another rating at the next time step.

Bitcoin Alpha. (BC-Alpha for short) The BC-Alpha is created in the same way as BC-OTC, except that the users and ratings come from a different trading platform.

UC Irvine messages. (UCI for short) The UCI dataset encapsulates an online social network comprising students from the University of California, Irvine, where the exchange of messages between users is depicted through the network’s links. Link prediction emerges as the quintessential task associated with this dataset.

The second group consists of two datasets for traffic speed regression.

SZ-taxi. This data set is taxi speed data in Shenzhen from January 1 to January 31, 2015. The data set counts the traffic speed of each road every 15 minutes.

Los-loop. This data set records the speed of 207 sensors on the Los Angeles Highway from March 1 to March 7, 2012. The dataset counts traffic speed on each road every 5 minutes.

Table 1 shows 4 dynamic graph datasets for link prediction tasks and Table 2 shows 2 dynamic graph datasets for traffic speed regression tasks.

Table 1 Experiment datasets for link prediction
Table 2 Experiment datasets for traffic speed regression

5.2 Evaluation indicators

To evaluate the prediction performance of the BDGNN model, we use several metrics. For the link prediction task, we use two experimental performance evaluation indicators:

  • Mean Average Precision (MAP):

    $$ \begin{aligned} \mathrm{MAP}=\frac{1}{K}\sum _{i=1}^{K}{\mathrm{AP}}_{i}, \end{aligned} $$
    (9)

    where \(\mathrm{AP}_{i}\) represents the prediction accuracy of each class, and K represents the total number of classes.

  • Mean Reciprocal Rank (MRR):

    $$ \begin{aligned} \mathrm{MRR}=\frac{1}{N}\sum _{i=1}^{N}\frac{1}{p_{i}}, \end{aligned} $$
    (10)

    where N represents the total number of samples, and \(p_{i}\) represents the actual category of sample i in the ranking in predictions.

For traffic prediction tasks, we use four experimental performance evaluation indicators. \(Y_{t}\) represents for the real traffic information and \(\hat{Y}_{t}\) represents for the prediction:

  • Root Mean Squared Error (RMSE):

    $$ \begin{aligned} \mathrm{RMSE} = \sqrt{\frac{1}{N} \sum _{i=1}^{N} (Y_{t} - \hat{Y}_{t})^{2}}. \end{aligned} $$
    (11)
  • Mean Absolute Error (MAE):

    $$ \begin{aligned} \mathrm{MAE} = \frac{1}{N} \sum _{i=1}^{N} |Y_{t} - \hat{Y}_{t}|. \end{aligned} $$
    (12)
  • Accuracy:

    $$ \begin{aligned} \mathrm{Accuracy}=1-\frac{\|Y-\hat{Y}\|_{F}}{\|Y\|_{F}}. \end{aligned} $$
    (13)
  • Coefficient of Determination (R2):

    $$ \begin{aligned} R^{2} = 1 - \frac{\sum_{i=1}^{N} (Y_{t} - \hat{Y}_{t})^{2}}{\sum_{i=1}^{N} (Y_{t} - \overline{Y})^{2}}. \end{aligned} $$
    (14)

5.3 Results

5.3.1 Link prediction

This paper compares the proposed method with six models in the link prediction task: GCN, GCN-GRU, DynGEM, dyngraph2vecAERNN [25], EvolveGCN-H, EvolveGCN-O. GCN, GCN, GRU, and two versions of EvolveGCN all use two-layer graph neural networks, while the two versions of BDGNN use a five-layer learner stack. Table 3 and Table 4 show the comparative experimental results of this paper. On the SBM and UCI data sets, BDGNN goes further on the basis of EvolveGCN and achieves better results. However, there is still an obvious gap with the unsupervised model represented by dyngraph2vecAERNN on the BC-OTC and BC-Alpha data sets, which demonstrate that unsupervised models that rely solely on graph structures may learn better representations on these datasets.

Table 3 Link prediction experiment MAP comparison
Table 4 Link prediction experiment MRR comparison

5.3.2 Traffic speed regression

In the traffic speed linear regression task, in this paper, we compare the performance of five models: GCN, GCN-GRU, DCRNN [26], T-GCN [27], AST-GCN [28]. Table 5 and Table 6 show the comparative experimental results of this article. Experimental results show that the prediction accuracy of both versions of BDGNN is higher than that of all other models on the SZ-taxi and Los-loop data sets. And the results of version H is better and more stable than version O, which reflects that the information of time evolution is more important than the information of topology structure in the two traffic speed datasets.

Table 5 Comparison of experimental indicators of SZ-taxi
Table 6 Comparison of experimental indicators of Los-loop

5.3.3 Impact of BDGNN on over-smoothing

To confirm that BDGNN mitigates the over-smoothing issue, this study also evaluates the MAP results across various model depths, contrasting BDGNN-O with other GCN-derived models on both the SBM and UCI datasets. Figure 4 illustrates as the number of layers increases, the prediction performance of standard GCN models deteriorates markedly due to over-smoothing; conversely, the two EvolveGCN variants counteract this effect by integrating dynamic graph information, but they still exhibit a general declining trend. In contrast, BDGNN demonstrates a continuous enhancement in performance with additional layers. As depicted in Fig. 5, beyond the fourth layer, the MAP performance for GCN and two versions of EvolveGCN falls sharply. Following the fifth layer, their MAP scores approach zero; whereas BDGNN maintains stable performance. This stability suggests that BDGNN can discriminate in the absorption of information from neighbors across different hops, thus maintaining a balance and averting the over-smoothing characteristic of traditional GCN models to a notable extent.

Figure 4
figure 4

MAP and Model Depth Relationship of SBM Dataset

Figure 5
figure 5

MAP and Model Depth Relationship of UCI Dataset

6 Discussion

Compared to existing methods, the proposed BDGNN in this study demonstrates superior performance on both the SBM dataset and UCI dataset for link prediction tasks. This outcome suggests that incorporating residual systems effectively mitigates the impact of noise in dynamic networks, thereby enhancing prediction accuracy. In the traffic speed regression dataset, BDGNN outperforms other models across all four evaluation metrics, indicating that the integration of high-order neighbor information through AdaBoost enables regression prediction to anticipate changes in such information, resulting in more stable prediction outcomes. Additionally, in the GCN layer comparison experiment, BDGNN exhibits performance improvement rather than degradation with the addition of layers, providing evidence for its effectiveness in alleviating over-smoothing effects. However, due to the scarcity of dynamic network data in smart manufacturing, the experimental setup may lead to unsatisfactory model adaptability.

7 Conclusions

This study introduces the BDGNN model, specifically engineered to surmount the limitations inherent in dynamic graph neural networks, such as their inadequate capacity to integrate information from high-order neighbors, the weak interplay between temporal and topological data, and suboptimal predictive accuracy. BDGNN leverages a Recurrent Neural Network (RNN) to process temporal dynamics of graphs, while adopting an approach akin to the AdaBoost algorithm with an RNN-like architecture to assimilate information from multi-order neighbors within the spatial domain. The results validate that the BDGNN model adeptly navigates both temporal dynamics and graph topology, surpassing extant approaches in tasks such as link prediction and traffic speed prediction through linear regression. The enhanced accuracy of dynamic graph neural networks in link prediction and regression tasks brings significant benefits to intelligent manufacturing. It enables optimized logistics and production processes, improves product quality and consistency, and effectively supports decision-making and resource allocation, thereby driving the development and innovation of smart manufacturing. Future endeavors will pivot towards the selection or crafting of novel GNN architectures that are attuned to the AdaBoost strategy, aiming to fully harness the topological and dynamic facets of dynamic graphs, thus rectifying the deficiencies observed in conventional GCN methodologies.