Abstract
Graph neural networks (GNNs) have achieved remarkable results for various graph learning tasks. However, one of the recent challenges for GNNs is to adapt to different types of graph inputs, such as heterophilic graph datasets in which linked nodes are more likely to contain a different class of labels and features. Accordingly, an ideal GNN model should adaptively accommodate all types of graph datasets with different labeling distributions. In this paper, we tackle this challenge by proposing a regularization framework on graph framelet with the regularizer induced from graph p-Laplacian. By adjusting the value of p, the p-Laplacian based regularizer restricts the solution space of graph framelet into the desirable region based on the graph homophilic features. We propose an algorithm to effectively solve a more generalized regularization problem and prove that the algorithm imposes a (p-Laplacian based) spectral convolution and diagonal scaling operation to the framelet filtered node features. Furthermore, we analyze the denoising power of the proposed model and compare it with the predefined framelet denoising regularizer. Finally, we conduct empirical studies to show the prediction power of the proposed model in both homophily undirect and heterophily direct graphs with and without noises. Our proposed model shows significant improvements compared to multiple baselines, and this suggests the effectiveness of combining graph framelet and p-Laplacian.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Graph neural networks (GNNs) have demonstrated remarkable ability for graph learning tasks [5, 53, 58, 65]. The input to GNNs is the so-called graph data which records useful features and structural information among data. Such data are widely seen in many fields, such as biomedical science [1], social networks [16], and recommend systems [32, 51, 53, 54, 68] are designed based on the homophily assumption i.e., nodes with similar features or labels are often linked with each other. Such phenomenon can be commonly observed in citation networks [13] which have been widely used as benchmarks in GNNs empirical studies. However, the homophily assumption may not always hold, and its opposite, i.e. Heterophily, can be observed quite often in many real-world datasets in which the linked nodes are more likely to contain different class labels and features [59]. For example, in online transaction networks, fraudsters are more likely to link to customers instead of other fraudsters [42]. The GNNs designed under homophilic assumption are deemed unsuitable for heterophilic graphs. It is evident from their significant performance degradation [59]. The reason is that the class of heterophilic graphs contains heterogeneous instances and hence the signals should be sharpened rather than smoothed out. An ideal framework for learning on graphs should be able to accommodate both homophilic and heterophilic scenarios.
One of the active aspects to resolve the GNN adaption challenge is by regularizing the solution of GNNs via the perspective of optimization. The work done by [67] unified most of state-of-the-art GNNs as an optimization framework. Furthermore, one of the recent works [18] considered assigning an adjustable p-Laplacian regularizer to the (discrete) graph regularization problem that is conventionally treated as a way of producing GNN outcomes (i.e., Laplacian smoothing). In view of the fact that the classic graph Laplacian regularizer measures the graph signal energy along edges under \(L_2\) metric, it would be beneficial if GNN could be adapted to heterophilic graphs under \(L_p\) metric (\(1 \le p < 2\)). Given that \(L_1\) metric is more robust to high-frequency signals, a higher model discriminative power is preserved. The early work [27] has demonstrated the advantage of adopting \(L_1\) metric in the Locality Preserving Projection (LPP) model. In addition, the recently proposed p-Laplacian GNN [18] adaptively modifies aggregation weights by exploiting the variance of node embeddings on edges measured by the graph gradient. With a further investigation on the application of p-Laplacian, [36] suggested an efficient gradient descent optimization strategy to construct the p-Laplacian embedding space that guarantees convergence to viable local solutions. Although the p-Laplacian based optimization scheme has been successfully lodged in basic GNN model [18], whether it can be further incorporated with more advanced GNN model (i.e., graph framelets) with higher prediction power and flexibility is still unclear. Specifically, one may be interested in identifying whether the advantage of deploying p-Laplacian in GNN still remains in graph framelet or what is the exact functionality of p-Laplacian interacting with the feature representations generated both low and high pass framelet domains. In addition, as graph framelets have shown great potential in terms of the graph noise reduction [56, 61, 69], whether the inclusion of p-Laplacian regularizer could enhance/dilute such power remains unclear. These research gaps inspire us to incorporate p-Laplacian into graph framelets and explore further for the underlying relationship between them.
Since the p-Laplacian regularized optimization problem lacks a closed-form solution, except for when \(p = 2\) [64], an iterative algorithm is suggested to estimate the solution and each iteration can be analogized to a GNN layer [62]). The solution of such an algorithm is defined by the first-order condition of the optimization problem. As a result, one can relate this to the implicit layer approach [10, 57] which has the potential of avoiding over-smoothing issues since the adjusted node feature will be re-injected in each iteration step. By adjusting the value of p, both p-Laplacian GNN and the algorithm are capable of producing learning outcomes with different discriminative levels and thus able to handle both homophilic and heterophilic graphs. Similar work has been done by [17] which presents a framework based on the Bregman layer to fulfill the bi-level optimization formulations. To reap the benefit of p-Laplacian regularization in the framelet domain, in this paper, we propose two p-Laplacian Regularized Framelet GCN models which are named p-Laplacian Undecimated Framelet GCN (pL-UFG) and p-Laplacian Fourier Undecimated Framelet GCN (pL-fUFG). In these models, the p-Laplacian regularization can be applied on either the framelet domain or the well-known framelet Fourier domain. The p-Laplacian-based regularizer incurs a penalty to both low and high-frequency domains created by framelet, producing flexible models that are capable of adapting different types of graph inputs (i.e., homophily and heterophily, direct and undirect). We summarize our contributions as follows:
-
We define two types of new GNN layers by introducing p-Laplacian regularizers to both decomposed and reconstructed framelets. This paves the way for introducing more general Bregman divergence regularization in the graph framelet framework;
-
We propose an iterative algorithm to fit the proposed models and explore an alternative algorithm developed from the F-norm LPP;
-
We prove that the iteration of the proposed algorithm provides a sequence of spectral graph convolutions and diagonal scaling over framelet-filtered graph signals, this gives an deeper explanation of how p-Laplacian regularizer interacts with the framelet.
-
We connect the proposed p-Laplacian regularizer to the previously studied framelet regularizer to illustrate the denoising power of the proposed model.
-
We investigate the performance of the new models on graph learning tasks for both homophilic (undirected) graphs and heterophilic (directed) graphs. To our best knowledge, we are the first to explore the possibility of applying the framelet GCN for directed graphs. The experiment results demonstrate the effectiveness of pL-UFG and pL-fUFG on real-world node classification tasks with strong robustness.
The rest of the paper is organized as follows. In Sect. 2, we introduce the p-Laplacian operator and regularized GNN, followed by a review of recent studies on regularized graph neural networks, which include implicit layers and graph homophily. In Sect. 3, we introduce the fundamental properties of graph framelet and propose the p-Laplacian regularized framelet models. Furthermore, we develop an algorithm to solve the regularization problem that is more general than p-Laplacian regularization. We also provide theoretical analysis to show how the p-Laplacian based regularizer interacts with the graph framelet. By the end of Sect. 3 a brief discussion on the denoising power of the proposed model is presented. Lastly, we present the experimental results and analysis in Sect. 5. Finally, this paper is concluded in Sect. 6.
2 Preliminaries
This section provides an in-depth exploration of the fundamental concepts, encompassing graphs, graph framelets, and regularized graph neural networks. For the sake of reader comprehension and ease of following the intended ideas, each newly introduced model and definition will be accompanied by a succinct review of its developmental history.
2.1 Basic notations
Let \(\mathcal {G} = (\mathcal {V}, \mathcal {E}, W)\) denote a weighted graph, where \(\mathcal {V} = \{v_1, v_2, \cdots , v_N \}\) and \(\mathcal {E} \subseteq \mathcal {V} \times \mathcal {V}\) represent the node set and the edge set, respectively. \(\textbf{X} \in {\mathbb {R}}^{N \times c}\) is the feature matrix for \(\mathcal {G}\) with \(\{ \textbf{x}_1, \textbf{x}_2, \cdots , \textbf{x}_N \}\) as its rows, and \(W = [w_{i,j}] \in \mathbb {R}^{N \times N}\) is the weight matrix on edges with \(w_{i,j}>0\) if \((v_i, v_j) \in \mathcal {E}\) and \(w_{i,j} = 0\) otherwise. For undirected graphs, we have \(w_{i,j} = w_{j,i}\) which means that W is a symmetric matrix. For directed graphs, it is likely that \(w_{i,j} \ne w_{j, i}\) which means that W may not be a symmetric matrix. In most cases, the weight matrix is the graph adjacency matrix, i.e., \(w_{i,j} \in \{0, 1\}\) with elements \(w_{i,j}=1\) if \((v_i, v_j) \in \mathcal {E}\) and \(w_{i,j} = 0\) otherwise. Furthermore, let \(\mathcal {N}_i = \{v_j: (v_i,v_j)\in \mathcal {E}\}\) denote the set of neighbours of node \(v_i\) and \(\textbf{D} = \text {diag}(d_{1,1},..., d_{N,N})\in \mathbb {R}^{N\times N}\) denote the diagonal degree matrix with \(d_{i,i} = \sum ^N_{j=1}w_{i,j}\) for \(i = 1,..., N\). The normalized graph Laplacian is defined as \(\widetilde{\textbf{L}} = \textbf{I}- \textbf{D}^{-\frac{1}{2}} (\textbf{W}+ \textbf{I}) \textbf{D}^{-\frac{1}{2}}\). Lastly, for any vector \(\textbf{x}= (x_1,..., x_c)\in \mathbb {R}^c\), we use \(\Vert \textbf{x}\Vert _2 = (\sum ^c_{i=1} x^2_i)^{\frac{1}{2}}\) to denote its L\(_2\)-norm, and similarly for any matrix \(\textbf{M}= [m_{i,j}]\), \(\Vert \textbf{M}\Vert :=\Vert \textbf{M}\Vert _F = (\sum _{i,j}m^2_{i,j})^{\frac{1}{2}}\) is used to denote its Frobenius norm.
2.2 Consistency in graphs
Generally speaking, most GNN frameworks are designed under the homophily assumption in which the labels of nodes and neighbours in the graph are mostly identical. The recent work by [66] emphasises that the general topology GNN fails to obtain outstanding results on the graphs with different class labels and dissimilar features in their connected nodes, which we call heterophilic or low homophilic graphs. The definition of homophilic and heterophilic graphs are given by:
Definition 1
(Homophily and Heterophily [18]) The homophily or heterophily of a network is used to define the relationship between labels of connected nodes. The level of homophily of a graph can be measured by \(\mathcal {H(G)} = \mathbb {E}_{i \in \mathcal {V}}[\vert \{j\}_{j \in \mathcal {N}_{i,y_i=y_i}}\vert /\vert \mathcal {N}_i \vert ]\), where \(\vert \{j\}_{j \in \mathcal {N}_{i,y_i= y_i}}\vert\) denotes the number of neighbours of \(i \in V\) that share the same label as i such that \(y_i = y_j\). \(\mathcal {H(G)} \rightarrow 1\) corresponds to strong homophily while \(\mathcal {H(G)} \rightarrow 0\) indicates strong heterophily. We say that a graph is a homophilic (heterophilic) graph if it has strong homophily (heterophily).
2.3 p-Laplacian operator
The first paper that explores the notation of graph p-Laplacian and utilizes it in the regularization problem in graph structure data can be traced back to the work [64], in which the regularization scheme was further explored to build a flexible GNN learning model in the study [18]. In this paper, we refer to the notation similar to the one used in the paper [18] to define the graph p-Laplacian operator. We first define the graph gradient as follows:
Definition 2
(Graph Gradient) Let \({\mathcal {F}}_{\mathcal{V}}:=\{\textbf{F} \vert {\textbf{F}}: {\mathcal {V}} \rightarrow {\mathbb {R}}^d\}\) and \({\mathcal {F}}_{\mathcal {E}}:=\{\textbf{g} \vert {\textbf{g}}: {\mathcal {E}} \rightarrow{ \mathbb {R}}^d\}\) be the function space on nodes and edges, respectively. Given a graph \({\mathcal {G}} = ({\mathcal {V}}, {\mathcal {E}}, W)\) and a function \({\textbf{F}} \in {\mathcal {F}}_{\mathcal {V}}\), the graph gradient is an operator \(\nabla _W\):\({\mathcal {F}}_{\mathcal {V}} \rightarrow {\mathcal {F}}_{\mathcal {E}}\) defined as for all \([i,j] \in {\mathcal {E}}\),
where \({\textbf{f}}_i\) and \({\textbf{f}}_j\) are the signal vectors on nodes i and j, i.e., the rows of \({\textbf{F}}\).
Without confusion, we will simply denote \(\nabla _W\) as \(\nabla\) for convenience. For \([i,j] \notin \mathcal {E}\), \((\nabla {\textbf{F}})([i,j]):= 0\). The graph gradient of \({\textbf{F}}\) at a vertex i, \(\forall i \in \{1,..., N\}\), is defined as \(\nabla {\textbf{F}}(i):= ((\nabla {\textbf{F}})([i, 1]);\dots ; (\nabla {\textbf{F}})([i,N]))\) and its Frobenius norm is given by \(\Vert \nabla {\textbf{F}}(i) \Vert _2:= (\sum ^N_{j=1}(\nabla {\textbf{F}})^2([i,j]))^{\frac{1}{2}}\) which measures the variation of \({\textbf{F}}\) around node i. Note that we have two explanations for the notation \(\nabla {\textbf{F}}\): one as a graph gradient (over edges) and one as node gradient (over nodes). The meaning can be inferred from its context in the rest of the paper. We also provide the definition of graph divergence, analogous to the Laplacian operator in continuous setting, which is the divergence of the gradient of a continuous function.
Definition 3
(Graph Divergence) Given a graph \(\mathcal {G} = (\mathcal {V}, \mathcal {E}, W)\) and a function \({\textbf{F}}: \mathcal V \rightarrow \mathbb R^d\), \({\textbf{g}}: \mathcal E \rightarrow \mathbb R^d\), the graph divergence is an operator \(\text {div}: \mathcal F_\mathcal E \rightarrow \mathcal F_\mathcal V\) which satisfies:
Furthermore, the graph divergence can be computed by:
Given the above definitions on graph gradient and divergence, we reach the definition of the graph p-Laplacian.
Definition 4
(p-Laplacian operator [18]) Given a graph \(\mathcal {G} = (\mathcal {V}, \mathcal {E}, W)\) and a multiple channel signal function \({\textbf{F}}: \mathcal V \rightarrow \mathbb R^d\), the graph p-Laplacian is an operator \(\Delta _p: \mathcal F_\mathcal V \rightarrow \mathcal F_\mathcal V\), defined by:
where \(\Vert \cdot \Vert ^{p-2}\) is element-wise power over the node gradient \(\nabla {\textbf{F}}\).
Remark 1
Clearly, when we have \(p=2\), Eq. (3) recovers the classic graph Laplacian. When \(p =1\), we can analogize Eq. (3) as a curvature operator defined on the nodes of the graph, because when \(p =1\), we have Eq. (3) as \(\Delta _1 {\textbf{F}}= -\frac{1}{2}\text {div}\left( \frac{\nabla {\textbf{F}}}{\Vert \nabla {\textbf{F}}\Vert }\right)\). This aligns with the definition of mean curvature operator defined on the continuous domain. Furthermore, we note that the p-Laplacian operator is linear when \(p = 2\), while in general for \(p \ne 2\), the p-Laplacian is a non-linear operator since \(\Delta _p(a{\textbf{F}}) \ne a\Delta _p({\textbf{F}}) \,\,\, \text {for} \,\,\, a \in \mathbb R\backslash \{1\}\).
Definition 5
(p-eigenvector and p-eigenvalue) Let \(\psi _p({\textbf{v}}) = (\Vert v_1 \Vert ^{p-2} v_1,..., \Vert v_N \Vert ^{p-2} v_N)\) for \({\textbf{v}}\in \mathbb R^N\). With some abuse of the notation, we call \({\textbf{u}}\in \mathcal F_V\) (\(d=1\)) a p-eigenvector of \(\Delta _p\) if \(\Delta _p {\textbf{u}}= \lambda \psi _p(\textbf{u})\) where \(\lambda\) is the associated p-eigenvalue.
Note that in above definition, we use the fact that \(\textbf{u}\) acting on all nodes gives a vector in \(\mathbb R^N\). Let \(\psi _p(\textbf{U}) = (\psi _p(\textbf{U}_{;,1}),..., \psi _p(\textbf{U}_{:, N}))\) for \(\textbf{U}\in \mathbb R^{N \times N}\). One [18] can show that p-Laplacian can be decomposed as \(\Delta _p = \psi _p(\textbf{U}) \mathbf \Lambda \psi _p(\textbf{U})^T\) for some diagonal matrix \(\varvec{\Lambda }\).
2.4 Graph framelets
Framelet is a type of wavelet frame. The study by [46] is the first to present a wavelet with a lifting scheme that provides a foundation in the research of wavelet transform on graphs. With the increase of computational power, [22] proposed a framework for the wavelet transformation on graphs and employed Chebyshev polynomials to make approximations on wavelets. [15] further developed tight framelets on graphs. The new design has been applied for graph learning tasks [60] with great performance enhancement against the classic GNNs. The recent studies show that framelet can naturally decompose the graph signal and re-aggregate them effectively, achieving a significant result on graph noise reduction [62] with the use of a double term regularizer on the framelet coefficients. Combing with singular value decomposition (SVD), the framelets have been made applicable to directed graphs [69].
A simple method of building more versatile and stable framelet families was suggested by [56] which is known as Quasi-Framelets. In this study, we will introduce graph framelets using the same architecture described in the paper [56]. We begin by defining the filtering functions for Quasi-Framelets:
Definition 6
(Filtering functions for Quasi-Framelets) We call a set of \(R+1\) positive filtering functions defined on \([0, \pi ]\), \(\mathcal {F} = \{g_0(\xi ), g_1(\xi ),..., g_R(\xi )\}\), Quasi-Framelet scaling functions if the following identity condition is satisfied:
such that \(g_0\) descends from 1 to 0 and \(g_R\) ascends from 0 to 1 as frequency increases over the spectral domain \([0, \pi ]\).
Particularly \(g_0\) aims to regulate the highest frequency while \(g_R\) to regulate the lowest frequency, and the rest to regulate other frequencies in between.
Consider a graph \(\mathcal {G} =(\mathcal {V}, \mathcal {E})\) with its normalized graph Laplacian \(\widetilde{\textbf{L}}\). Let \(\widetilde{\textbf{L}}\) have the eigen-decomposition \(\widetilde{\textbf{L}} = \textbf{U}\Lambda \textbf{U}^T\) where \(\textbf{U}\) is the orthogonal spectral bases with its spectra \(\Lambda = \text {diag}(\lambda _1, \lambda _2,..., \lambda _N)\) in increasing order. For a given set of Quasi-Framelet functions \(\mathcal {F} = \{g_0(\xi ), g_1(\xi ),..., g_R(\xi )\}\) defined on \([0, \pi ]\) and a given level J (\(\ge 0\)), define the following Quasi-Framelet signal transformation matrices
Note that in the above definition, m is called the coarsest scale level which is the smallest m satisfying \(2^{-m}\lambda _N \le \pi\). Denote by \(\mathcal {W} = [\mathcal {W}_{0,J}; \mathcal {W}_{1,0};...; \mathcal {W}_{R,0};\) \(\mathcal {W}_{1,1};..., \mathcal {W}_{R,J}]\) as the stacked matrix. It can be proved that \(\mathcal {W}^T\mathcal {W} = \textbf{I}\), thus providing a signal decomposition and reconstruction process based on \(\mathcal {W}\). We call this graph Framelet transformation.
In order to alleviate the computational cost imposed by eigendecomposition for the graph Laplacians, the framelet transformation matrices can be approximated by Chebyshev polynomials. Empirically, the implementation of the Chebyshev polynomials \(\mathcal {T}^n_r(\xi )\) with a fixed degree n, \(n=3\) is sufficient to approximate \(g_r(\xi )\). In the sequel, we will simplify the notation by using \(\mathcal {T}_r(\xi )\) instead of \(\mathcal {T}^n_r(\xi )\). Then the Quasi-Framelet transformation matrices are defined in Eqs. (6, 7) can be approximated by,
Remark 1
The approximated transformation matrices defined in Eqs. (8, 9, 10) simply depend on the graph Laplacian. For directed graphs, we directly take in the Laplacian normalized by the out degrees in our experiments. We observe this strategy leads to improved performance in general.
For a graph signal \(\textbf{X}\), the framelet (graph) convolution similar to the spectral graph convolution that can be defined as
where \(\theta\) is the learnable network filter. We also call \(\mathcal {W}\textbf{X}\) the framelet coefficients of \(\textbf{X}\) in Fourier domain. The signal then will be filtered in its spectral-domain according to learnable filter \(\text {diag}(\theta )\).
2.5 Regularized graph neural network
In reality, graph data normally have large, noisy and complex structures [25, 29]. This brings the challenge of choosing a suitable neural network for fitting the data. It is well known that one of the common computational issues for most classic GNNs is over-smoothing which can be quantified by Dirichlet energy that converges to zero as the number of layers increases. This observation leads to an investigation into the so-called implicit layers on regularized graph neural networks.
The first paper to consider GNNs layer as a regularized signal smoothing process is done by [67], in which the classic GNN layers are interpreted as the solution of the regularized optimization problem, with certain approximation strategies to avoid matrix inversion in the closed-form solution. The regularized layer also can be linked to the implicit layer [57], more specific examples are given by [67]. In general, given the node features, the output of the GNN layer can be written as the solution for the following optimization problem
where \(\textbf{F}\) is the graph signal. Equation (12) defines the layer output as the solution for the (layer) optimization problem with a regularization term \(\text {tr}(\textbf{F}^T \widetilde{\textbf{L}}\textbf{F})\}\), which is able to enforce smoothness of a graph signal. The closed form solution is given by \(\textbf{F}= (\textbf{I}+ \frac{1}{2\mu } \widetilde{\textbf{L}})^{-1}\textbf{X}\). It is computationally inefficient because of matrix inversion. However, we can rearrange it to \((\textbf{I}+ \frac{1}{2\mu } \widetilde{\textbf{L}})\textbf{F}= \textbf{X}\) and interpret it as a fixed point solution to \(\textbf{F}^{(t+1)} = -\frac{1}{2\mu } \widetilde{\textbf{L}} \textbf{F}^{(t)} + \textbf{X}\). The latter is then the implicit layer in GNN at layer t. Such an iteration is motivated by the recurrent GNNs as shown in several recent works [21, 34, 43]. Different from explicit GNNs, the output features \(\textbf{F}\) of a general implicit GNN are directly modeled as the solution of a well defined implicit function, e.g., the first order condition from an optimization problem. Denote the rows of \(\textbf{F}\) by \(\textbf{f}_i\) (\(i=1, 2,..., N\)) in column shape. It is well known that the regularization term (12) is
which is the so-called graph Dirichlet energy [64]. As proposed in the work [64], one can replace the graph Laplacian matrix in the above equation to the pre-defined graph p-Laplacian, then the p-Laplacian based energy denoted as \(\mathcal S_p(\textbf{F})\) can be defined as (for any \(p \ge 1\)):
where we adopt the definition of element-wise p-norm as in paper [18]. Finally, the regularization problem becomes
With strong generalizability of the regularization form in Eq. (14), it is natural to consider to deploy Eq. 14 to multi-scale graph neural networks (i.e., graph framelets) to explore whether the benefits of allocating p-Laplacian based regularizer still remains and how the changes of p within included regularizer interacts with the feature propagation process via each individual filtered domains. In the next section, we will accordingly proposed our model and provide detailed discussion and analysis on it.
3 The proposed models
In this section, we show our proposed models: p-Laplacian Framelet GNN (pL-UFG) and p-Laplacian Fourier Undecimated Framelet GNN (pL-fUFG) models. In addition, we introduce a more general regularization framework and describe the algorithms of our proposed models.
3.1 p-Laplacian undecimated framelet GNN (pL-UFG)
Instead of simply taking the convolution result as the framelet layer output (derived in Eq. (11)), we apply the p-Laplacian regularization on the framelet reconstructed signal by imposing the following optimization to define the new layer,
where \(\mu \in (0, \infty )\). In which we recall that the first term \(\mathcal {S}_p(\textbf{F})\) in the above equation is the p-Laplacian based energy defined in Eq. (13). \(\mathcal {S}_p(\textbf{F})\) measures the total signals’ variation throughout the graph-based format of p-Laplacian. The second term dictates that optimal \(\textbf{F}\) should not deviate excessively from the framelet reconstructed signal for the input signal \(\textbf{X}\). Each component of the network filter \(\theta\) in the frequency domain is applied to the framelet coefficients \(\mathcal {W}\textbf{X}\). We call the model in Eq.(15) pL-UFG2.
One possible variant to model (15) is to apply the regularization individually on the reconstruction at each scale level, i.e., for all the \(r, \ell\), define
We name the model with the propagation in the above equation as pL-UFG1. In our experiment, we note that pL-UFG1 performs better than pL-UFG2.
3.2 p-Laplacian Fourier undecimated framelet GNN (pL-fUFG)
In pL-fUFG, we take a strategy of regularizing the framelet coefficients. Informed by earlier experience, we consider the following optimization problem for each framelet transformation in Fourier domain. For each framelet transformation \(\mathcal {W}_{r,j}\) defined in Eqs. (9)-(10), define
Then the final layer output is defined by the reconstruction as follows
Or we can only aggregate all the regularized and filtered framelet coefficients in the following way
With the form of both pL-UFG and pL-fUFG, in the next section we show an even more generalized regularization framework by incorporating p-Laplacian with graph framelets.
3.3 More general regularization
For convenience, we write the graph gradient for multiple channel signals \(\textbf{F}\) as
or simply \(\nabla \textbf{F}\) if no confusion. For an undirected graph we have \(\nabla _W \textbf{F}([i,j]) = - \nabla _W \textbf{F}([j,i])\). With this notation, we can re-write the p-Laplacian regularizer in (13) (the element-wise p-norm) in the following.
where \(v_j \sim v_i\) stands for the node \(v_j\) that is connected to node \(v_i\) and \(\nabla _W\textbf{F}(v_i)= \left( \nabla _W \textbf{F}([i,j]) \right) _{v_j: (v_i, v_j) \in \mathcal {E}}\) is the node gradient vector for each node \(v_i\) and \(\Vert \cdot \Vert _p\) is the vector p-norm. In fact, \(\Vert \nabla _W\textbf{F}(v_i)\Vert _p\) measures the variation of \(\textbf{F}\) in the neighbourhood of each node \(v_i\). Next, we generalize the regularizer by considering any positive convex function \(\phi\) as
It is clear when \(\phi (\xi ) = \xi ^p\), we recover the p-Laplacian regularizer. In image processing field, several penalty functions have been proposed in the literature. For example, \(\phi (\xi ) = \xi ^2\) is known in the context of Tikhonov regularization
[2, 3, 30]. When \(\phi (\xi ) = \xi\) (i.e. \(p=1\)), it is the classic total variation regularization. When \(\phi (\xi ) = \sqrt{\xi ^2 + \epsilon ^2} - \epsilon\), it is referred as the regularized total variation. An example work can be found in the work [11]. The case of \(\phi (\xi ) = r^2\log (1 + \xi ^2/r^2)\) corresponds to the non linear diffusion [40].
In pL-UFG and pL-fUFG, we use \(\textbf{Y}\) to denote \(\mathcal {W}^T\text {diag}(\theta )\mathcal {W}\textbf{X}\) and \(\text {diag}(\theta _{r,j}) \mathcal {W}_{r,j} \textbf{X}\) respectively, as then we propose the generalized p-Laplacian regularization models as
3.4 The algorithm
We derive an iterative algorithm for solving the generalized p-Laplacian regularization problem (22), presented as the following theorem.
Theorem 1
For a given positive convex function \(\phi (\xi )\), define
and denote the matrices \(\textbf{M}= [M_{i,j}]\), \(\varvec{\alpha } = \text {diag}(\alpha _{11},..., \alpha _{NN})\) and \(\varvec{\beta } = \text {diag}(\beta _{11},..., \beta _{NN})\). Then the solution to problem (22) can be solved by the following message passing process
with an initial value, e.g., \(\textbf{F}^{(0)} = \textbf{0}\).
Here \(\varvec{\alpha }^{(k)}\), \(\varvec{\beta }^{(k)}\) and \(\textbf{M}^{(k)}\) are calculated according to the current features \(\textbf{F}^{(k)}\). When \(\phi (\xi ) = \xi ^p\), from Eq. (23) we can have the algorithm for solving pL-UFG and pL-fUFG.
Proof
Define \(\mathcal {L}^{\phi }_p(\textbf{F})\) as the objective function in Eq. (22), consider a node feature \(\textbf{f}_i\) in \(\textbf{F}\), then we have
Setting \(\frac{\partial \mathcal {L}^{\phi }_p(\textbf{F})}{\partial \textbf{f}_i} = 0\) gives the following first order condition:
This equation defines the message passing on each node \(v_i\) from its neighbour nodes \(v_j\). With the definition of \(M_{i,j}\), \(\alpha _{ii}\) and \(\beta _{ii}\), it can be turned into the iterative formula in Eq. (23). This completes the proof. \(\square\)
Remark 2
When \(p\le 1\), the objective function is not differentiable in some extreme cases for example the neighbor node signals are the same and with the same degrees. In these rare cases, the first order condition cannot be applied in the above proof. However in practice, we suggest the following alternative iterative algorithm to solve the optimization problem. In fact, we can split the terms in \(\mathcal {S}_p(\textbf{F})\) as
At iteration k, we take \(w^{\text {new}}_{i,j} = w_{i,j} \left\| \nabla _W \textbf{F}([i,j])\right\| ^{p-2}\) as the new edge weights, then the next iterate is defined as the solution to optimize the Dirichlet energy with the new weights, i.e.,
Thus one step of the classic GCN can be applied in the iteration to solve the p-Laplacian regularized problem (14).
3.5 Interaction between p-Laplacian and framelets
In this section, we present some theoretical support on how the p-Laplacian regularizer interact with the framelets in the model. Specifically, we show that the proposed algorithm in Eq. (23) provides a sequence of (p-Laplacian-based) spectral graph convolutions and diagonal scaling of the node features over the framelet filtered graph signals.This is indicated by the following analysis.
First considering the iteration Eq. (23) with initial values \(\textbf{F}^{(0)} = \textbf{Y}= \mathcal {W}^T\text {diag}(\theta )\mathcal {W}\textbf{X}\) or \(\text {diag}(\theta _{r,j}) \mathcal {W}_{r,j} \textbf{X}\) without loss of generalityFootnote 1, we have
The result \(\textbf{F}^{(K)}\) depends on the key operations \(\varvec{\alpha }^{(k)}\widetilde{\textbf{M}}^{(k)}\) for \(k = 0, 1,..., K-1\), where
where \(\widetilde{M}_{ij}\) has absorbed \(d_{i,i}\) and \(d_{j,j}\) into \(M_{ij}\) defined in Theorem 1, i.e., \(\widetilde{M}_{ij} = M_{i,j}/\sqrt{d_{i,i}d_{j,j}}\).
Denote
To introduce the following analysis, define the relevant matrices \(\widetilde{\textbf{M}}^{(k)} = [\widetilde{M}^{(k)}_{i,j}]\), \(\varvec{\alpha }^{(k)} = \text {diag}(\alpha _{ii}^{(k)} )\) and \(\textbf{W}^{(k)}_p = [w^{(k)}_{i,j}(p)]= \textbf{w}^{(k)}_p \oplus (\textbf{w}^{(k)}_p)^T\) (the Kronecker sum) with the column vector
Now, recall that the definition of the classic p-Laplacian is:
Our purpose is to show that our generalized algorithm Eq. (25) above can be implemented as the linear combination of the classic p-Laplacian filtering. First, based on [18], the operator Eq. (27) in the original p-Laplacian message passing algorithm can be written in detailed matrix form as follows
where the matrix \(\textbf{G}^{(k)}\) has elements
and the diagonal matrix \(\varvec{\alpha }^{(k)}_0\) has diagonal element defined as
Eq. (28) shows that the operation \(\Delta ^{(k)}_p(\textbf{F}^{(k)})\) can be represented as the product of a matrix (still denoted as \(\Delta ^{(k)}_p\)) and the signal matrix \(\textbf{F}^{(k)}\).
Noting that \(\textbf{G}^{(k)} = \widetilde{\textbf{M}}^{(k)}\oslash \textbf{W}^{(k)}_p\), multiplying diagonal matrix \(\varvec{\alpha }^{(k)}\) on both sides of Eq. (28) and taking out \(\textbf{F}^{(k)}\) give
As \(\varvec{\alpha }^{(k)}\) is diagonal, we can re-write the above equality as
Given that \(\textbf{W}^{(k)}_p\) is the Kronecker sum of the vector \(\textbf{w}^{(k)}_p\) and its transpose, if we still use \(\textbf{w}^{(k)}_p\) as its diagonal matrix, then we have
This demonstrates that the key terms \(\varvec{\alpha }^{(k)} \widetilde{\textbf{M}}^{(k)}\) in our generalized algorithm is in linear form of p-Laplacian operator \(\Delta ^{(k)}_p\). As demonstrated in the research [18], the operator \(\Delta ^{(k)}_p\) approximately performs spectral graph convolution. Hence we can conclude that the generalized p-Laplacian iterations (23) indeed performs a sequence of graph spectral convolutions (see Lemma 1 below) and gradient-based diagonal transformation (i.e., node feature scaling) over the framelet filtered graph signals.
Lemma 1
The matrix \(\Delta ^{(k)}_p\) is SPD (see [4, 37]) which has its own eigendecomposition, offering a graph spectral convolution interpretation.
Remark 3
Eq. (29) indicates that the p-Laplacian regularizer provides graph spectral convolution on top of the framelet filtering, which produces a second layer of filtering conceptually and hence restricts the solution space further. See Fig. 1 for the illustration. Interestingly, the combination of p-Laplacian regularization and framelet offers a more adaptive smoothness that suites both homophilic and heterophilic data as shown in our experiments.
Remark 4
In terms of asymptotic behavior of Eq. (25), one can show roughly that the elements in \(\varvec{\alpha }^{(k)}\widetilde{\textbf{M}}^{(k)}\) are between 0 and 1, which converges to zero if K is too large. Therefore a large K annihilates the first term in Eq. (25) but leaves the second term and partial sum from the third. A larger value of \(\mu\) seems to speed up this convergence further, resulting shortening the time for finding the solution. However, it is a model selection problem for generalizability. Moreover, the inclusion of the source term \(\varvec{\beta }^{(K-1)}\textbf{Y}\) guarantees to supply certain amount of variations of the node features from both low and high frequency domains of the graph framelet and it has been shown such inclusion can help the model escape from over-smoothing issue [12].
Remark 5
Compared to GPRGNN [12] in which the model outcome is obtained by
where \(\varvec{\gamma }_k \in \mathbb R^{N\times N}\) is learnable generalized Page rank coefficients, as we have shown in Eq. (29), our proposed algorithm defined in Eq. (23) provides a mixed (spectral convolution and diagonal scaling) to the framelet graph signal outputs and thus further directs the solution space of framelet to a reasonably defined region. Furthermore, due to the utilization of the Chebyshev polynomial in framelet, the computational complexity for the first filtering is not high, which is helpful in terms of defining the provably corrected space for the second operation. This shows the efficiency of incorporating framelet in p-Laplacian based regularization.
4 Discussions on the proposed model
In this section, we conduct comprehensive discussions for our proposed models. We note that we will mainly focused on exploring the property of the generalized p-Laplacian based framelet GNN (Eq. (22)) and the iterative algorithm in Eq. (23), although we may also show some conclusions of pL-UFG and pL-fUFG as well. Specifically, in Sect. 4.1 we discuss the denoising power of our proposed model. Section 4.2 analyzes the model’s computational complexity. Section 4.3 provides a comparison between the p-Laplacian regularizer with other regularizers that are applied to GNNs via different learning tasks. Finally, the section is concluded (Sect. 4.4) by illustrating the limitation of the model and potential aspects for future studies.
4.1 Discussion on the denoising power
The denoising power of framelet has been developed in the study[15] and empirically studied by the past works [15, 56, 62]. Let \(\textbf{X}= \textbf{F}+ \mathbf \tau\) be the input node feature matrix with noise \(\mathbf \tau\). In works [15, 62], a framelet-based regularizer was proposed to resolve the optimization problem which can be described as:
where \(f(\cdot ,\cdot )\) is some fidelity function that takes different forms for different applications. For any graph signal \(\textbf{F}\), the graph \(\mathbb L_p\)-norm \(\Vert \textbf{F}\Vert _{p,\mathcal G}:= \left( \sum _i \vert \textbf{f}_i\vert ^p \times d_{i,i}\right) ^{1/p}\), where \(d_{i,i}\) is the degree of the i-th node with respect to \(\textbf{f}\). \(\mathcal D\) is a linear transformation generated from discrete transformations, for example, the graph Fourier transforms or the Framelet transforms. Thus, the regularizer assigns a higher penalty to the node with a larger number of neighbours. Specifically, considering the functional \(\mathcal D\) as the framelet transformation, then the first term in Eq. (30) can be written as:
where \(\mathcal Z = \{ (r,\ell ): r = 1,...,R, \ell = 0,...,J \} \cup \{ (0,J)\}\). Replacing the p-Laplacian regularizer in Eq. (15) by the framelet regularizer defined in the above Eq. (31) we have:
Followed by the work in the research [15], the framelet regularizer-based denoising problem in Eq. (30) is associated with the variational regularization problem that can be generally described in the form similar to Eq. (15) where the variational term \(\mathcal S_p(\textbf{F})\) is utilized for regularizing the framelet objective function. Simply by replacing the input of the first term in Eq. (32) by \(\nabla \textbf{F}\) and omitting node degree information, we have:
Therefore our proposed model naturally is equipped with denoising capability and the corresponding (denoising) regularizing term aligns with the denoising regularizer developed in the paper [64] without nodes degree information. However, the work in the study [64] only handles \(p=1\) and \(p=2\). Whereas our p-Laplacian framelet model covers the values of \(p \in \mathbb R_{+}\). This allows more flexibility and effectiveness in the model.
Furthermore, the major difference between most wavelet frame models and the classic variational models is the choice of the underlying transformation (i.e., \(\mathcal D\) applied to the graph signal) that maps the data to the transformed domain that can be sparsely approximated. Given both p-Laplacian (i.e., Eq. (15)) and framelet regularizers (i.e., Eq. (31)) target on the graph signal \(\textbf{F}_\mathcal Z\) that is produced from the sparse and tight framelet transformation (Chebyshev polynomials), this further illustrates the equivalence between two denoising regularizers. Please refer to [15] for more details.
4.2 Discussion on the computational complexity
In this section, we briefly discuss the computational complexity of our proposed model, specific to the generalized model defined in Eq. (22) with its results that can be approximated by the algorithm presented in Eq. (23). The primary computational cost of our model stems from the generation of the framelet decomposition matrices (\(\mathcal W\)) or the framelet transformation applied to the node features. We acknowledge that generating \(\mathcal W\) involves a high computational complexity, as transitional framelet methods typically employ eigen-decomposition of the graph Laplacian for this purpose. Consequently, the framelet transform itself exhibits a time complexity of \(\mathcal {O}(N^2(nj+1)Kc)\) and a space complexity of \(\mathcal {O}(N^2(nj+1)c)\) for a graph with N nodes and c-dimensional features. Notably, the constants n, j, and K are independent of the graph data. Existing literature [60] has demonstrated that when the input graph contains fewer than \(4\times 10^4\) nodes (with fixed feature dimension), the computational time for framelet convolution is comparable to that of graph attention networks with 8 attention heads [48]. As the graph’s node size increases, the performance of graph attention networks degrades rapidly, while graph framelets maintain faster computation. However, our application of Chebyshev polynomials to approximate \(\mathcal W\) significantly reduces the associated computational cost compared to traditional methods. Additionally, we acknowledge that the inclusion of the p-Laplacian based implicit layer introduces additional computational cost to the original framelet model, primarily arising from the computation of the norm of the graph gradient, denoted as \(|\nabla \textbf{F}|\). Considering the example of the Euclidean norm, the computational cost for \(|\nabla \textbf{F}|\) scales as \(\mathcal O(N\times c)\), where N and c represent the number of nodes and feature dimension, respectively. Thus, even when accounting for the computation of the implicit layer, the overall cost of our proposed method remains comparable to that of the framelet model.
4.3 Comparison with other regularizers and potential application scenarios
As we have illustrated in Sect. 3.3, with the assistance from different form of the regularizers, GNNs performance could be enhanced via different learning tasks. In this discussion, we position our study within the wider research landscape that investigates various regularizers to enhance the performance of Graph Neural Networks (GNNs) across different learning tasks. We earlier discussed the denoising power of the pL-UFG in Sect. 4.1, establishing that it can be expressed in terms of a denoising formulation. This is comparable to the approach from [38], who used a different regularizer term to highlight the denoising capabilities of GNNs. They showed that their regularizer can effectively reduce noise and enhance the robustness of GNN models.
Our study, however, emphasizes the unique advantages that the p-Laplacian brings, a theme also echoed in the works by [33] and [8]. Both studies incorporated the \(L_1\)-norm as a regularization term in robust principal component analysis (RPCA), showcasing its ability to recover low-rank matrices even in the presence of significant noise. Furthermore, the study by [35] reinforces the benefits of the \(L_1\)-norm in preserving discontinuity and enhancing local smoothness. Characterized by its heavy-tail property, the \(L_1\)-norm imposes less penalty on large values, thereby making it effective in handling data with substantial corruption or outliers.
Furthermore, in terms of the potential practical implementations, one can deploy our method into various aspects. For example, the p-Laplacian based regularizer can be used in image processing and computer vision applications, where it can help to smooth out noisy or jagged images while preserving important features. In addition, we note that, as we implement our graph learning method via an optimization (regularization) framework, this suggests any potential practical implementations (such as graph generation [26], graph based time series forecasting [49]) of GNNs can also be deployed under our proposed methods with a higher flexibility power.
4.4 Limitation and potential future studies
While outstanding properties have been presented from our proposed model, the exploration of how the p-Laplacian based regularization framework can be deployed to various types of graphs (i.e., dynamic or spatio-temporal) is still wanted. In fact, one may require the corresponding GNNs to be able to capture the pattern of the evolution (i.e., combing GNN with LSTM or Transformer [39]) of the graph when the input graph is dynamic. We note that such a requirement is beyond the capability of the current model. However, as the p-Laplacian based regularizer restricts the solution of the graph framelet, it would be interesting to make the quantity of p learnable throughout the training process. We leave this to future work.
Moreover, followed by the idea of making p value as a learnable parameter in the model, one can also explore assigning different values of p to different frequency domains of framelet. It has been verified that the low-frequency domain of framelet induces a smoothing effect on the graph signal whereas the sharpening effect is produced from the high-frequency domain [23]. Therefore, one may prefer to obtain a relatively large quantity of p to the low-frequency domain to enhance the smoothing effect of the framelet when the input graph is highly homophily, as one prefers to predict identical labels for connected nodes. On the other hand, a smaller value of p is wanted for a high-frequency domain to further sharpen the differences between node features (i.e., \(\nabla \textbf{F}\)) so that distinguished labels can be produced from the model when the input graph is heterophilic. Moreover, as the p-Laplacian-based implicit layers are allocated before the framelet reconstruction, it would be interesting to explore how such implicit layers can affect the signal reconstruction of the framelet.
5 Experiments
In this section, we present empirical studies on pL-UFG and pL-fUFG on real-world node classification tasks with both heterophilic and homophilic graphs. We also test the robustness of the proposed models against noise. Both two experiments are presented with detailed discussions on their results. In addition, we discuss by adjusting the quantity of the p in our proposed model, the so-called ablation study is automatically conducted in our experiments. The code for our experiment can be accessed via https://github.com/superca729/pL-UFG. Lastly, it is worth noting that our proposed method has the potential to be applied to the graph learning task other than node classification such as graph level classification (pooling) [60] and link prediction [31]. Although we have yet to delve into these tasks, we believe that by assigning some simple manipulations to our methods, such as deploying the readout function for graph pooling or computing the log-likelihood for graph link prediction, our method is capable of handling these tasks. Similarly, our model could be beneficial in community detection tasks by possibly identifying clusters of similar characteristics or behaviors. We leave these promising research aspects to our future work.
5.1 Datasets, baseline models and the parameter setting
Datasets. We use both homophilic and heterophilic graphs (datasets) from https://www.pyg.org/ to assess pL-UFG and pL-fUFG, including benchmark heterophilic datasets: Chameleon, Squirrel, Actor, Wisconsin, Texas, Cornell, and homophilic datasets: Cora, CiteSeer, PubMed, Computers, Photos, CS, Physics. In our experiments, homophilic graphs are undirected and heterophilic graphs are directed (where we observe an improved performance when direction information is provided). We included the summary statistics of the included datasets together with their homophily index and split ratio in Table 1
Baseline models. We consider eight baseline models for comparison:
-
MLP: standard feedward multiple layer perceptron.
-
GCN [28]: GCN is the first of its kind to implement linear approximation to spectral graph convolutions.
-
SGC [50]: SGC reduces GCNs’ complexity by removing nonlinearities and collapsing weight matrices between consecutive layers.
-
GAT [48]: GAT is a graph neural network that applies the attention mechanism on node feature to learn edge weight.
-
JKNet [55]: JKNet can flexibly leverage different neighbourhood ranges to enable better structure-aware representation for each node.
-
APPNP [19]: APPNP combines GNN with personalized PageRank to separate the neural network from the propagation scheme.
-
GPRGNN [12]: GPRGNN architecture that adaptively learns the GPR (General Pagerank) weights so as to jointly optimize node feature and topological information extraction, regardless the level of homophily on a graph.
-
p-GNN [18]:p-GNN is p-Laplacian based GNN model, whose message-passing mechanism is derived from a discrete regularization framework.
-
UFG [60]: UFG is a type of GNNs based on framelet transforms, the framelet decomposition can naturally aggregate the graph features into low-pass and high-pass spectra.
The test results are reproduced using code that runs in our machine, which might be different from the reported results. In addition, compared to p-GNN, the extra time and space complexity induced from our algorithm is \(O(N^2(RJ+1)d)\).
Hyperparameter tuning. We use grid search for tuning hyperparameters. We test \(p \in \{1.0, 1.5, 2.0, 2.5\}\) for PGNN, pL-UFG and pL-fUFG. The learning rate is chosen from {0.01, 0.005}. We consider the number of iterations T in p-Laplacian message passing from \(\{4, 5\}\) after 10 warming-up steps. For homophilic datasets, we tune \(\mu \in \{0.1, 0.5, 1, 5, 10\}\) and for heterophilic graphs \(\mu \in \{3, 5, 10, 20, 30, 50, 70\}\). The framelet type is fixed as Linear (see [56]) and the level J is set to 1. The dilation scale \({s} \in \{1, 1.5, 2, 3, 6\}\), and for n, the degree of Chebyshev polynomial approximation to all g’s in (15–18), is drawn from {2, 3, 7}. It is worth noting that in graph framelets, the Chebyshev polynomial is utilized for approximating the spectral filtering of the Laplacian eigenvalues. Thus, a d-degree polynomial approximates d-hop neighbouring information of each node of the graph. Therefor, when the input graph is heterophilic, one may require d to be relatively larger as node labels tend to be different between directly connected (1-hop) nodes. The number of epochs is set to 200, the same as the baseline model[18].
5.2 Experiment results and discussion
Node classification: Tables 2 and 3 summarize the results on homophilic and heterophilic datasets. The values after ± are standard deviations. The top three results are highlighted in First, Second, and Third. From the experiment results, we observe that pL-UFG and pL-fUFG achieve competitive performances against the baselines in most of the homophily and heterophily benchmarks. For the models’ performance on homophily undirect graphs, Table 2 shows that pL-UFG2\(^{1.5}\) has the top accuracy in Cora, Photos and CS. For Citeseer, pL-UFG1\(^{2.0}\), pL-UFG2\(^{2.0}\) and pL-fUFG\(^{2.0}\) have the best performance. In terms of the performance on heterophily direct graphs (Table 3), we observe that both of our two models are capable of generating the highest accuracy in all benchmark datasets except Actor where the top performance was generated from MLP. However, we note that for Actor, our pL-UFG1\(^{1.0}\) achieves almost identical outcomes compared to those of MLP.
Aligned with our theoretical prediction in Remark 4 from the experiment, we discover that the value of the trade-off term \(\mu\) for pL-UFG and pL-fUFG is significantly higher than that in pGNN, indicating that framelet has a major impact on the performance of the model. The performance of pL-UFG, pL-fUFG and others on heterophilic graphs are shown in Table 3. pL-UFG and pL-fUFG both can outperform MLP and other state-of-the-art GNNs under a low homophilic rate. In terms of denoising capacity, pL-UFG and pL-fUFG are far better than the baseline models. Figures 6 and 7 show that both pL-UFG and pL-fUFG produce the top accuracies across different noise levels.
Discussion for \({p < 2}\). To enable GNN model to better adapt to the heterophilic graphs, an ideal GNN model shall induce a relatively high variation in terms of the node feature generated from it. Therefore, compared to the models with \(p=2\), the model regularized with \(p \ne 2\) regularizer imposes a lesser penalty and thus produces outputs with a higher variation. This is supported by the empirical observation from Table 3 in which the highest prediction accuracy for heterophilic graphs are usually achieved by our model with \(p <2\).
Discussion for p = 1. Here we specifically focus on \(p=1\) in our models. Recall that when \(p=1\), the p-Laplacian operator acts on the graph signal \(\textbf{F}\) is \(\Delta _1 \textbf{F}= -\frac{1}{2}\text {div}\left( \frac{\nabla \textbf{F}}{\Vert \nabla \textbf{F}\Vert }\right)\). Based on Remark 1, when \(p=1\), \(\Delta _1 \textbf{F}\) is equivalent to the mean curvature operator defined on the embedded surface. In analogy to differential geometry, points that curvature can not be properly defined are so-called singular points. Thus one can similarly conclude that when a graph contains a singular node (i.e., the graph is crumpled [7]) then both \(\Delta _1 \textbf{F}\) and its corresponding \(\mathcal S_p (\textbf{F})\) can not be properly defined. Furthermore, one can easily check that, when \(p=1\), the regularization term \(\phi (\xi ) = \xi ^{p}\) produces a higher penalty than its \(\mathcal S_p(\textbf{F})\) counterparts. This is because, when \(p=1\), \(\mathcal S_p(\textbf{F}) = (\sum _i^c (\nabla \textbf{f}_i)^2)^{\frac{1}{2}}\) whereas \(\phi (\xi ) = \xi ^{1} = \sum _i \vert \nabla \textbf{f}_i\vert\). Clearly, we have \(\mathcal S_p(\textbf{F}) < \phi (\xi )\) unless all of the nodes in the graph are singular nodes, or in the graph in which there is only one non-singular point while the rest nodes are singular.
5.3 Visualization on the effect of p
Based on the claim we made in previous sections, increasing the value of p leads our model to exhibit a stronger smoothing effect on the node features, thereby making it more suitable for homophilic graphs. Conversely, when p is small, the model preserves distinct node features, making it a better fit for heterophilic graphs. To validate this idea, we visualize the changes in relative positions (distances) between node features for \(p=1\) (the minimum value) and \(p=2.5\) (the maximum value). We specifically selected the Cora and Actor datasets and employed Isomap [47] to map both the initial node features and the node features generated by our model to \(\mathbb R^2\), while preserving their distances. The results are depicted in Fig. 2 and 3.
From both Fig. 2 and 3, it can be observed that when \(p=1\), the sharpening effect induced by our model causes the node features to become more distinct from each other. Under the Isomap projection, the nodes are scattered apart compared to the input features. Conversely, when a relatively large value of \(p=2.5\) is assigned, the model exhibits a stronger smoothing effect, resulting in all nodes being aggregated towards the center in the Isomap visualization. These observations provide direct support for our main claim regarding the proposed model.
5.4 Experiments on denoising capacity
Followed by the discussion in Sect. 4.1, in this section we evaluate our proposed models’ (pL-UFG and pL-fUFG) denoising capacity on both homophilic (Cora) and heterophilic (Chameleon) graphs. Since the node features of the included datasets are in binary, we randomly assign binary noise with different proportions of the node features (i.e., \(r \in \{5\%, 10\%, 15\%, 20\%\}\)). From Fig. 6 and 7 we see that pL-UFG\(2^{1.5}\) defined in Eq. (16) outperforms other baselines and showed the strongest robustness to the contaminated datasets. This is expected based on our discussion on the denoising power in Sect. 4.1 and the formulation of our models (defined in Eqs. (15), (16) and (17)). The denoising capacity of our proposed model is sourced from two parts including the sparse approximation of the framelet decomposition and reconstruction, i.e., \(\mathcal W\) and \(\mathcal W^T\), and variational regularizer \(\mathcal S_p(\textbf{F})\) [15]. Compared to pL-UFG1 defined in Eq. (15), pL-UFG2 assigns the denoising operators, \(\mathcal W\), \(\mathcal W^T\) and \(\mathcal S_{p}(\textbf{F})\), to the node inputs that is reconstructed from both sparsely decomposed low and high-frequency domains so that the denoising operators target on the original graph inputs at different scales and hence naturally result in a better model denoising performance. In addition, without taking the reconstruction in the first place, the term in pL-fUFG \(\Vert \textbf{F}_{r,j} - \text {diag}(\theta _{r,j}) \mathcal {W}_{r,j} \textbf{X}\Vert ^2_F\) is less sparse than the pL-UFG2 counterpart. Thus regularizing with the same \(\mathcal S_p(\textbf{F})\), the effect from insufficient eliminated noise from pL-fUFG will lead pL-fUFG to an incorrect (respect to pL-UFG2) solution space which can not be recovered by the additional reconstruction using \(\mathcal W^T\), and this is the potential reason for observing an outperforming result from pL-UFG2 compared to other proposed models.
In addition to the above experiment, here we show how the results of how the quantity of p affect the denoising power of the proposed model, while kee** all other parameters constant. The results of the changes of the denoising power are included in Fig. 4 (homophilic graph) and 5 (heterophilic graph).
Based on the observations from the results, it appears that the performance of different values of p varies under different noise rates. For the homophilic graph, the performance of pL-UFG2\(^{1.5}\) and pL-UFG2\(^{2.0}\) seems to be relatively stable across different noise rates, while the performance of other values of p fluctuates more. For the heterophilic graph, the performance of pL-UFG1\(^{2.5}\) and pL-UFG2\(^{2.5}\) appears to be relatively stable across different noise rates, while the performance of other values of p fluctuates more.
We also note that these observations on the fluctuations of the results are expected, as we have shown that the denoising process of our proposed model can be presented as:
where \(\mathcal Z = \{ (r,\ell ): r = 1,...,R, \ell = 1,...,J \} \cup \{ (0,J)\}\) is the set of indices of all framelet decomposed frequency domains. It is not hard to verify that once a larger quantity of p is assigned, the penalty on the node feature difference (\(\nabla \textbf{F}\)) becomes to be greater. Therefore, a stronger denoising power is induced. In terms of different graph types, when the input graph is heterophily, in most of the cases, the connected nodes tend to have distinct features, after assigning noise to those features, if the feature difference becomes larger, a higher quantity of p is preferred. In the meanwhile, adding noise for the heterophilic graph could also make the feature difference becomes smaller, in this case a large quantity of p may not be appropriate, this explains why most of lines in Fig. 5 are with large fluctuations. Similar reasoning can be applied to the case of the denoising performance of our model for homophilic graphs, we omit it here.
5.5 Regarding to ablation study
It is worth noting that one of the advantages of assigning an adjustable p-Laplacian-based regularizer is on the convenience of conducting the ablation study. As the key principle of ablation study is to test whether a new component (in our case the regularizer) in a proposed method can always add advantages over baseline model counterparts regardless of the newly involved parameters. This suggests that the ablation study of our method are naturally conducted to compare our models with the regularizers with the underlying networks via both the node classification tasks and the model denoising power test (Figs. 6 and 7). It is not hard to observed that in most of cases, regardless of the change of p, our proposed model kept outperforming baseline models. However, as our proposed model was driven from an regularization framework, another potential parameter that we note could affect the model performance is the quantity of \(\mu\). Based on Eq. (22), an increase of the quantity of \(\mu\) leads to a increase of model’s convexity, as a consequence, could guide model more closed to the global optima when \(p \ne 2\). Another observation to \(\mu\) is we found that a smaller value of \(\mu\) together with a relatively bigger value of p are more suitable for homophily/heterophilic graphs, this implies that \(\mu\) seems to have an opposite effect on model’s adaption power compared to p. However, to quantify the effect of \(\mu\) via a suitable measure (i.e., potentially Dirichlet energy [23, 45]) is out of the scope of this paper, we leave it to future discussions.
6 Conclusion and further work
This paper showcases the application of p-Laplacian Graph Neural Networks (GNN) in conjunction with framelet graphs. The incorporation of p-Laplacian regularization brings remarkable flexibility, enabling effective adaptation to both homophilic undirected and heterophilic directed graphs, thereby significantly enhancing the predictive capabilities of the model. To validate the efficacy of our proposed model, we conducted extensive numerical experiments on diverse graph datasets, demonstrating its superiority over baseline methods. Notably, our model exhibits robustness against noise perturbation, even under high noise levels. These promising findings highlight the tremendous potential of our approach and warrant further investigations in several directions. For instance, delving into the intriguing mathematical properties of our model, including weak and strong convergence, analyzing the behavior of (Dirichlet) energy, and establishing connections with non-linear diffusion equations, opens up fascinating avenues for future research.
Notes
In our practical algorithm we choose \(\textbf{F}^{(0)} = 0\), this will gives \(\textbf{F}^{(1)} = \beta ^{(0)}\textbf{Y}\) with a diagonal matrix \(\beta ^{(0)}\).
References
Ahmedt-Aristizabal D, Armin MA, Denman S, Fookes C, Petersson L (2021) Graph-based deep learning for medical diagnosis and analysis: past, present and future. Sensors 21(14):4758
Assis AD, Torres LC, Araújo LR, Hanriot VM, Braga AP (2021) Neural networks regularization with graph-based local resampling. IEEE Access 9:50727–50737
Belkin M, Matveeva I, Niyogi P (2004) Tikhonov regularization and semi-supervised learning on large graphs. IEEE Int Conf Acoust Speech Signal Process 3:3–1000
Bozorgnia F, Mohammadi SA, Vejchodskỳ T (2019) The first eigenvalue and eigenfunction of a nonlinear elliptic system. Appl Numer Math 145:159–174
Bronstein MM, Bruna J, LeCun Y, Szlam A, Vandergheynst P (2017) Geometric deep learning: going beyond Euclidean data. IEEE Signal Process Mag 34(4):18–42
Bruna J, Zaremba W, Szlam A, LeCun Y (2014) Spectral networks and locally connected networks on graphs. In: Proceedings of International Conference on Learning Representations
Burda Z, Correia J, Krzywicki A (2001) Statistical ensemble of scale-free random graphs. Phys Rev E 64(4):046118
Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM (JACM) 58(3):1–37
Chen J, Wang Y, Bodnar C, Liò P, Wang YG (2022) Dirichlet energy enhancement of graph neural networks by framelet augmentation. https://yuguangwanggithubio/papers/EEConvpdf
Chen Q, Wang Y, Wang Y, Yang J, Lin Z (2022) Optimization-induced graph implicit nonlinear diffusion. In: Proceedings of the 39th International Conference on Machine Learning
Cheng T, Wang B (2020) Graph and total variation regularized low-rank representation for hyperspectral anomaly detection. IEEE Trans Geosci Remote Sens 58(1):391–406. https://doi.org/10.1109/TGRS.2019.2936609
Chien E, Peng J, Li P, Milenkovic O (2021) Adaptive universal generalized pagerank graph neural network. In: Proceedings of International Conference on Learning Representations
Ciotti V, Bonaventura M, Nicosia V, Panzarasa P, Latora V (2016) Homophily and missing links in citation networks. EPJ Data Sci 5:1–14
Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems. Springer, Cham, p 29
Dong B (2017) Sparse representation on graphs by tight wavelet frames and applications. Appl Comput Harmon Anal 42(3):452–479. https://doi.org/10.1016/j.acha.2015.09.005
Fan W, Ma Y, Li Q, He Y, Zhao E, Tang J, Yin D (2019) Graph neural networks for social recommendation. In: Proceedings of WWW, pp 417–426
Frecon J, Gasso G, Pontil M, Salzo S (2022) Bregman neural networks. International conference on machine learning. PMLR, pp 6779–6792
Fu G, Zhao P, Bian Y (2022) \(p\)-Laplacian based graph neural networks. Proc Thirty-Nine Int Conf Mach Learn 162:6878–6917
Gasteiger J, Bojchevski A, Günnemann S (2019) Predict then propagate: graph neural networks meet personalized pagerank. In: Proceedings of International Conference on Learning Representations
Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for quantum chemistry. International conference on machine learning. PMLR, pp 1263–1272
Gu F, Chang H, Zhu W, Sojoudi S, El Ghaoui L (2020) Implicit graph neural networks. Advances in neural information processing systems. Springer
Hammond DK, Vandergheynst P, Gribonval R (2011) Wavelets on graphs via spectral graph theory. Appl Comput Harmon Anal 2:129–150
Han A, Shi D, Shao Z, Gao J (2022) Generalized energy and gradient flow via graph framelets. ar**v:2210.04124
He M, Wei Z, Xu H et al (2021) Bernnet: learning arbitrary graph spectral filters via Bernstein approximation. Adv Neural Inf Process Syst 34:14239–14251
He X, Kempe D (2015) Stability of influence maximization. In: Proceedings of the 20th ACM International Conference on Knowledge Discovery and Data Mining, pp 1256–1265
Ho J, Jain A, Abbeel P (2020) Denoising diffusion probabilistic models. Adv Neural Inf Process Syst 33:6840–6851
Hu X, Sun Y, Gao J, Hu Y, Yin B (2018) Locality preserving projection based on F-norm. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, pp 1330–1337
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: Proceedings of International Conference on Learning Representations
Leskovec J, Faloutsos C (2006) Sampling from large graphs. Proc ACM Int Conf Knowl Discov Data Min. https://doi.org/10.1145/1150402.1150479
Li J, Lin S, Blanchet J, Nguyen VA (2022) Tikhonov regularization is optimal transport robust under martingale constraints. 2210.01413
Lin L, Gao J (2023) A magnetic framelet-based convolutional neural network for directed graphs. IEEE Int Conf Acoust Speech Signal Process (ICASSP). https://doi.org/10.1109/ICASSP49357.2023.10097148
Liu A, Li B, Li T, Zhou P, Wang R (2022) An-gcn: an anonymous graph convolutional network against edge-perturbing attacks. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2022.3172296
Liu G, Lin Z, Yu Y (2010) Robust subspace segmentation by low-rank representation. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp 663–670
Liu J, Kawaguchi K, Hooi B, Wang Y, **ao X (2021) Eignn: efficient infinite-depth graph neural networks. Advances in neural information processing systems. Springer
Liu X, ** W, Ma Y, Li Y, Liu H, Wang Y, Yan M, Tang J (2021) Elastic graph neural networks. International conference on machine learning. PMLR, pp 6837–6849
Luo D, Huang H, Ding CHQ, Nie F (2010) On the eigenvectors of \(p\)-Laplacian. Mach Learn 81(1):37–51
Ly I (2005) The first eigenvalue for the p-Laplacian operator. JIPAM J Inequal Pure Appl Math 6:91
Ma Y, Liu X, Zhao T, Liu Y, Tang J, Shah N (2021) A unified view on graph neural networks as graph signal denoising. https://openreview.net/forum?id=MD3D5UbTcb1
Manessi F, Rozza A, Manzo M (2020) Dynamic graph convolutional networks. Pattern Recognit 97:107000
Oka T, Yamada T (2023) Topology optimization method with nonlinear diffusion. Comput Methods Appl Mech Eng 408:115940. https://doi.org/10.1016/j.cma.2023.115940
Ortega A, Frossard P, Kovacević J, Moura JMF, Vandergheynst P (2018) Graph signal processing: overview, challenges, and applications. Proc IEEE 106(5):808–828
Pandit S, Chau DH, Wang S, Faloutsos C (2007) Netprobe: a fast and scalable system for fraud detection in online auction networks. In: Proceedings of the 16th International Conference on World Wide Web, pp 201–210
Park J, Choo J, Park J (2021) Convergent graph solvers. In: Proceedings of International Conference on Learning Representations
Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2009) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80
Shi D, Shao Z, Guo Y, Zhao Q, Gao J (2023) Revisiting generalized p-Laplacian regularized framelet GCNs: convergence, energy dynamic and training with non-linear diffusion. https://doi.org/10.48550/ar**v.2305.15639
Sweldens W (1998) The lifting scheme: a construction of second generation wavelets. SIAM J Math Anal 29(2):511–546. https://doi.org/10.1137/S0036141095289051
Tenenbaum JB, Silva Vd, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2018) Graph attention networks. In: Proceedings of International Conference on Learning Representations
Wen H, Lin Y, **a Y, Wan H, Zimmermann R, Liang Y (2023) Diffstg: probabilistic spatio-temporal graph forecasting with denoising diffusion models. ar**v preprint ar**v:2301.13629
Wu F, Zhang T, Souza AHd, Fifty C, Yu T, Weinberger KQ (2019) Simplifying graph convolutional networks. In: Proceedings of International Conference on Machine Learning
Wu J, Sun J, Sun H, Sun G (2021) Performance analysis of graph neural network frameworks. Proc IEEE Int Symp Perform Anal Syst Softw. https://doi.org/10.1109/ISPASS51385.2021.00029
Wu S, Sun F, Zhang W, **e X, Cui B (2022) Graph neural networks in recommender systems: a survey. ACM Comput Surv 55:1–37
Wu Z, Pan S, Chen F, Long G, Zhang C, Yu PS (2020) A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst 32(1):4–24
Xu K, Hu W, Leskovec J, Jegelka S (2018) How powerful are graph neural networks? In: Proceedings of International Conference on Learning Representations
Xu K, Li C, Tian Y, Sonobe T, Kawarabayashi Ki, Jegelka S (2018) Representation learning on graphs with jum** knowledge networks. In: Proceedings of International Conference on Machine Learning
Yang M, Zheng X, Yin J, Gao J (2022) Quasi-Framelets: another improvement to graph neural networks. ar**v:2201.04728
Zhang Q, Gu Y, Mateusz M, Baktashmotlagh M, Eriksson A (2003) Implicitly defined layers in neural networks. arxiv:200301822
Zhang Z, Cui P, Zhu W (2022) Deep learning on graphs: a survey. IEEE Trans Knowl Data Eng 34(1):249–270. https://doi.org/10.1109/TKDE.2020.2981333
Zheng X, Liu Y, Pan S, Zhang M, ** D, Yu PS (2021) Graph neural networks for graphs with heterophily: a survey. In: Proceedings of the AAAI Conference on Artificial Intelligence
Zheng X, Zhou B, Gao J, Wang YG, Lio P, Li M, Montufar G (2021) How framelets enhance graph neural networks. In: Proceedings of International Conference on Machine Learning
Zheng X, Zhou B, Wang YG, Zhuang X (2022) Decimated framelet system on graphs and fast g-framelet transforms. J Mach Learn Res 23:18–1
Zhou B, Li R, Zheng X, Wang YG, Gao J (2021) Graph denoising with framelet regularizer. ar**v:2111.03264
Zhou B, Liu X, Liu Y, Huang Y, Lio P, Wang Y (2021) Spectral transform forms scalable transformer. ar**v:2111.07602
Zhou D, Schölkopf B (2005) Regularization on discrete spaces. DAGM Symp 3663:361–368
Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2020) Graph neural networks: a review of methods and applications. AI Open 1:57–81
Zhu J, Yan Y, Zhao L, Heimann M, Akoglu L, Koutra D (2020) Beyond homophily in graph neural networks: current limitations and effective designs. Adv Neural Inf Process Syst 33:7793–7804
Zhu M, Wang X, Shi C, Ji H, Cui P (2021) Interpreting and unifying graph neural networks with an optimization framework. In: Proceedings of WWW
Zhu S, Pan S, Zhou C, Wu J, Cao Y, Wang B (2020) Graph geometry interaction learning. Adv Neural Inf Process Syst 33:7548–7558
Zou C, Han A, Lin L, Gao J (2022) A simple yet effective SVD-GCN for directed graphs. arxiv:220509335
Funding
Open Access funding enabled and organized by CAUL and its Member Institutions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Shao, Z., Shi, D., Han, A. et al. Enhancing framelet GCNs with generalized p-Laplacian regularization. Int. J. Mach. Learn. & Cyber. 15, 1553–1573 (2024). https://doi.org/10.1007/s13042-023-01982-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-023-01982-8