Introduction

The prediction of trip generation stands as the initial and pivotal stage within the four-step model (FSM) for traffic demand modeling, a concept extensively discussed by McNally (2007). This integral process defines the overall magnitude of travel, concurrently dissecting the travel magnitude in origins and destinations of trips into productions and attractions (McNally 2007). As a crucial component of metropolitan transportation, the nature of commuting travel is particularly distinctive; in contrast to non-scheduled travel, it exhibits a highly cyclic pattern and repeats regularly (Rodrigue 2020). Following the core principle of trip generation, the productions and attractions of commuting imply the traffic demand at commuting origins and destinations. Furthermore, the commuting generation reflects the economic interconnectivity across locations and signifies the delicate balance between jobs and housing in metropolitan areas (Latham and Pinto 2022; Zhao et al. 2011). The continuous and rapid urban development witnessed in metropolitan areas has contributed to a surge in resource scarcity, heightened traffic congestion, and increased economic challenges (Chavhan and Venkataram 2020). In light of these pressing issues, quantitative analysis and accurate forecasts of commuting generation emerge as invaluable tools for optimizing transportation resource allocation, fortifying resilient city construction, and fostering interregional economic cooperation (Chavhan and Venkataram 2018; Ganin et al. 2017; Martinus et al. 2020).

Machine learning methods have found widespread application in traffic prediction due to their proficiency in handling nonlinear correlations within traffic flow data (Nagy and Simon 2018). However, their effectiveness is constrained when extracting complex spatiotemporal features from real-world traffic behaviors (Zantalis et al. 2019). Confronting this limitation, deep learning has been employed to address the challenges posed by high-dimensional features and large traffic datasets. One notable approach within deep learning involves the use of graph convolutional networks (GCNs) and their variants (Kipf and Welling 2016). These models, recognized for their capacity to accurately represent intricate traffic topologies and comprehend changing spatiotemporal dependencies, have become prominent in the realm of traffic prediction (Mena-Oreja and Gozalvez 2020). Various studies have extended and transformed GCN models to enhance their performance, particularly in the context of predicting trip generations and traffic flows (Yin et al. 2021). Despite the success of GCN models, their application in common traffic prediction may not be entirely suitable for commuting forecasts (Yin et al. 2023). The unique nature of commuting travel introduces distinctive socioeconomic properties and individual characteristics that play pivotal roles in the prediction process (Gao et al. 2022). Metropolitan indicators exemplified by land use, gross domestic product (GDP), and housing prices have demonstrated their significance in reconstructing commuting networks (Gao et al. 2022; Spadon et al. 2019). The anthropic dimensions of metropolitan indicators like demographic occupation and education emerge as crucial factors influencing commuting behaviors (Sun et al. 2020). Despite their importance, these factors remain inadequately considered in the current research on commuting generation (Zhou and Tang 2023). Moreover, individual characteristics that compose commuter portraits, such as age and gender, contribute to mobility gaps among commuter groups (Rodrigue 2020). These disparities highlight differences in access to individual transportation and underscore income-related inequalities (Xu et al. 2023). Despite their acknowledged impact, how the metropolitan indicators and commuter portraits influence the prediction of commuting generation remains an open question in the literature. A comprehensive understanding of this issue is crucial for advancing the accuracy and inclusivity of commuting prediction models.

In response to these identified limitations, we proposed a metropolis-informed GCN (MetroGCN) framework embedding the topological commuter portraits. This innovative framework incorporates multi-graphs with spatial–temporal attention mechanisms to forecast commuting generation using a superior joint representation of environmental and anthropic characteristics. Our contributions manifest in three fundamental aspects:

  1. (1)

    We underscored the importance of capturing latent spatial–temporal correlations and socioeconomic semantics embedded in commuting patterns, leading to a notable enhancement of accuracy and stability in forecasting commuting generation.

  2. (2)

    We introduced a novel approach to modeling topological commuter portraits with multi-graph representations, subsequently identifying the distinct impacts of commuter portraits, encompassing age, gender, and income, on the forecasting performance of the commuting generation.

  3. (3)

    We bolstered the model's adaptability to commuting dynamics by integrating temporal demographic occupation and education as anthropic indicators, providing valuable insights into the unique roles of metropolitan indicators in predicting commuting attraction and production, respectively.

Through quantitative experiments using the Shenzhen metropolitan area as a case study, our method exhibits superiority over state-of-the-art models, thereby affirming its efficacy in advancing the field of commuting forecasts.

The subsequent sections of this paper offer a comprehensive exploration of the proposed methodology for commuting forecasts. “Literature Review” Section presents an in-depth examination of related work. “Methodology” Section delineates the intricacies of the proposed method. "Experiments" Section delves into the comprehensive experimental design. “Analysis and Results” Section unfolds with a detailed illustration of experimental results. Finally, “Conclusion” Section encapsulates the overall achievements of this study and concurrently presents avenues for future research endeavors.

Literature review

A wealth of research has been dedicated to investigating various forecast models for higher accuracy and efficiency across transportation scenarios. Over decades of effort, traffic prediction methodologies have evolved into three overarching categories: parametric models, machine learning models, and deep learning models.

Parametric models, characterized by transparent computational structures, approach traffic prediction as a regression problem. These models are commonly employed for short-term traffic prediction and include methodologies such as the K-nearest neighbor (KNN) model, conventional vector auto-regressive (VAR) model, and autoregressive integrated moving average (ARIMA) model (Hearst et al. 1998; Zhang et al. 2013; Zivot and Wang 2006). While these models provide strong theoretical interpretability, their prediction ability is constrained when handling intricate nonlinear relationships within the traffic network. Subsequently, machine learning models emerge to address this limitation with their robust generalization capabilities. Representative models include support vector machines (SVM), the Gaussian process, and the hidden Markov model (Hearst et al. 1998; Matthews et al. 2018; Shin and Sunwoo 2018). These algorithms stand out in modeling systematic uncertainty and reorganizing latent traffic information, which notably improves the forecasting precision compared to the parametric models. As the traffic network evolves with metropolitan development, machine learning models fall short in simultaneously representing various spatial and temporal relationships. Deep learning models are chosen to optimize traffic prediction issues because of their exceptional proficiency in extracting high-dimensional features from heterogeneous datasets (Gobezie and Fufa 2020). The classical deep learning framework in traffic applications integrates convolutional neural networks (CNNs) and short-long term memory (LSTM) networks, laying the groundwork for further investigations in this domain (Hochreiter and Schmidhuber 1997; LeCun et al. 1998; Wu and Tan 2016). Specifically, CNNs are adept at capturing spatial dynamics, whereas LSTM specializes in temporal dynamics. An alternative strategy incorporates the use of gated recurrent unit (GRU) networks (Hussain et al. 2021) offering heightened efficiency characterized by a reduced number of training parameters. Despite the commendable performance of CNN-based techniques in discerning spatial characteristics within Euclidean structural data, they exhibit limitations in modeling the topological structure of spatial interactions across diverse traffic scales (Zhao et al. 2019).

GCNs offer a compelling direction for enhancing traffic prediction methodologies leveraging their robust capacity to discern the latent spatial information in non-Euclidean data (Jiang and Luo 2022). Beyond modeling topological structures, GCNs exhibit scalability in learning different ranges of spatial interactions, making them well-suited for the intricacies of metropolitan traffic networks (Shaygan et al. 2022). The foundational spatial–temporal graph convolutional network (STGCN) introduces sequentially stacked spatial–temporal convolution blocks based on Chebyshev’s spectral CNN (ChebNet) intended to capture features in both spatially interconnected locations and time series data (Defferrard et al. 2016; Yu et al. 2017). The attention-based spatial–temporal graph convolution network (ASTGCN) takes it a step further by integrating attention mechanisms with GCNs, facilitating the model to effectively learn latent spatial dynamics and time-dependent relationships in transportation (Guo et al. 2019). The multiple information spatial–temporal attention-based GCN (MISTAGCN) assimilates spatial and temporal attention separately into a K-order GCN layer incorporating numerous hidden variables to rectify information imbalances inherent in multi-sourced traffic datasets (Tao et al. 2023). While GCN-based methods have demonstrated remarkable performance in general traffic predictions, their application to commuting forecasts remains limited. Drawing inspiration from Tobler’s first law of geography (Tobler 2004), a geo-contextual multitask embedding learner (GMEL) is introduced for predicting commuting flows, which encodes geographic contextual information using graph’s attention networks (GATs) (Liu et al. 2020). Subsequently, a hybrid model featuring a preprocessing-encoder-decoder framework is devised for commuting flow prediction, leveraging geographical semantics and regional proximity effects (Yin et al. 2023). To imbue the prediction model with physical meaning, a geographic competition graph (GCG) is built from the job selection process, offering an advanced method for reconstructing commuting networks (Zhou and Tang 2023).

The extant literature underscores noteworthy limitations along three pivotal dimensions. Primarily, the deficiency of contemporary metropolitan indicators emerges as a critical concern characterized by their predominantly static nature that fails to encapsulate the dynamic essence of commuting activities influenced by environmental shifts (Sun et al. 2020). While certain studies for traffic prediction integrate dynamic elements such as weather conditions, the efficacy of such factors in predicting commuting generations remains limited because they are governed by common societal mechanisms (Koesdwiady et al. 2016; Zhou and Tang 2023). In contrast, essential anthropic indicators in metropolitan areas, representing a significant oversight in capturing the dynamic properties of commuting-related places, are notably absent in existing commuting forecasts (Flint et al. 2016). Secondly, the literature often overlooks the nuanced relationship between spatial interaction patterns and the individual characteristics of commuters (Dai et al. 2016). In other words, the prevailing focus on spatial interaction topologies derived from abstractions of traffic flows underscores a gap in understanding topological commuting structures associated with individual commuter portraits (Calabrese et al. 2013). Last, the absence of a holistic approach to learning the commuting generation from multiple metropolitan indicators and commuter portraits poses a substantial challenge. This gap impedes the development of a robust and inclusive commuting forecasting model, emphasizing the need for an integrated framework to enhance predictive accuracy in comprehending and forecasting commuting behaviors.

Methodology

Problem statement

Definition 1

Metropolitan commuting generation: The metropolitan commuting generation comprises commuting attraction and commuting production. The metropolitan area is divided into N regions with the same size, and there are T discrete time slots. For a specific region \(n \in N\) at a time slot \(t \in T\) where each slot spans 1 h, the commuting attraction is denoted as \(C_{{A_{t} }} = \left( {a_{t}^{1} ,a_{t}^{2} , \ldots ,a_{t}^{N} } \right)\), and the commuting production is denoted as \(C_{{P_{t} }} = \left( {p_{t}^{1} , p_{t}^{2} , \ldots ,p_{t}^{N} } \right).\) Thus, the historical commuting records can be represented as \(\left[ {\left\{ {C_{{A_{1} }} ,C_{{P_{1} }} } \right\},\left\{ {C_{{A_{2} }} ,C_{{P_{2} }} } \right\}, \ldots ,\left\{ {C_{{A_{t} }} ,C_{{P_{t} }} } \right\}} \right]\).

Definition 2

Metropolitan spatial–temporal features: The metropolitan spatial–temporal features encompass four categories: socioeconomic features, land use features, demographic features, and historical commuting records of regions within the metropolitan area. Let \(x_{t}^{n} = \left( {x_{t}^{n1} ,x_{t}^{n2} , \ldots ,x_{t}^{nF} } \right) \in {\mathbb{R}}^{F}\) denote the value of all features of region n at time t. Further, let \(X_{t} = \left( {x_{t}^{1} ,x_{t}^{2} , \ldots ,x_{t}^{N} } \right) \in {\mathbb{R}}^{{\left( {F \times N} \right)}}\) represent all the feature values of all regions at time t, and \({\mathcal{X}} = \left( {X_{1} ,X_{2} , \ldots ,X_{T} } \right) \in {\mathbb{R}}^{{\left( {F \times N \times T} \right)}}\) denote all the feature values of all regions over the T time slots.

Definition 3

Metropolitan commuting network: The metropolitan commuting network in this study is composed of three categories of undirected graphs \(G = \left( {V,E,A} \right)\). In this context, V represents a set of nodes \(v_{i} \in V_{N}\) corresponding to N regions. E represents a set of edges \(e_{ij} = \left( {v_{i} ,v_{j} } \right) \in E\) denoting the connections between nodes \(v_{i}\) and \(v_{j}\). \(A \in {\mathbb{R}}^{N \times N}\) denotes the adjacency matrix of graph G. The three graph categories used in this study are as follows:

  1. (1)

    Commuting distance graph \(G_{D}\) describes the distance between the central locations of regions in the metropolitan area.

  2. (2)

    Spatial interaction graph \(G_{I}\) illustrates the commuting pattern constructed by commuting flows between regions in the metropolitan area.

  3. (3)

    Topological commuter portrait graph (TCPG) \(G_{C}\) explicitly characterizes the spatial correlation with various commuter portraits and topological commuting structures.

Problem

By leveraging all categories of historical features \({\mathcal{X}} = \left( {X_{1} ,X_{2} , \ldots ,X_{t} } \right)\) of all nodes in the commuting network across past t time slots and incorporating graph structures \(\left\{ {G_{D} ,G_{I} ,G_{C} } \right\}\), this study aims to forecast the future commuting generation sequences of all nodes in the entire commuting network over the upcoming \(T_{s}\) time slots. The sequences of commuting generation to forecast are denoted as.

\({\mathcal{Y}} = \left( {Y_{1} ,Y_{2} , \ldots ,Y_{{T_{s} }} } \right) = \left[ {\left\{ {\hat{C}_{{A_{t + 1} }} ,\hat{C}_{{P_{t + 1} }} } \right\},\left\{ {\hat{C}_{{A_{t + 2} }} ,\hat{C}_{{P_{t + 2} }} } \right\}, \ldots ,\left\{ {\hat{C}_{{A_{{t + T_{s} }} }} ,\hat{C}_{{P_{{t + T_{s} }} }} } \right\}} \right] \in {\mathbb{R}}^{{N \times T_{s} }}\).

Framework overview

The proposed method for forecasting commuting generation comprises three key stages, as illustrated in Fig. 1. In the initial phase, we engaged in geostatistical feature engineering by drawing on various types of metropolitan area information to refine the initially heterogeneous inputs for our model. Environmental indicators encompass socioeconomic factors such as house prices and GDP woven with the spatial distribution of significant land-use categories. Anthropic indicators, including demographic data, involve temporal population dynamics across occupational groups, educational levels, and dwelling types. Historical commuting records provide details like the origin–destination specifics, departure-arrival time, and travel distance. In parallel, we constructed commuting graphs, specifically employing TCPGs to generate the essential adjacency matrix for the model. Moving the second and third stages, we developed the MetroGCN model and leveraged it for forecasting commuting generation. This process involves experimenting with distinct combinations of features and graphs to optimize the predictive accuracy.

Fig. 1
figure 1

Framework of commuting generation forecasts with MetroGCN and TCPGs

Geostatistical feature engineering

The primary phase of data-driven traffic issue prediction hinges on the pivotal process of geostatistical feature engineering, a critical step aimed at enriching the integration of the spatial–temporal context to bolster the overall performance of the predictive model. Our study employs a comprehensive workflow comprising four key strategies. Initially, we confront the challenge of heterogeneity in geostatistical data, a formidable obstacle to effective feature fusion manifesting across spatial and temporal dimensions with variations in scale, resolution, and sample size. To navigate this issue, we implemented a data alignment strategy using the empirical Bayesian Kriging method (Krivoruchko 2012), an approach that mitigates data sparsity and inconsistencies by estimating missing values based on the spatial relationships within the data. This ensures the harmonization of spatial–temporal properties across diverse metropolitan features with historical commuting records. Subsequently, a pivotal aspect of our methodology involves generating a multi-dimensional feature fusion matrix from the aligned datasets specifically mapped to target regions and time slots. An essential operation in this phase is to translate data from the geographical coordinate system into the uniquely index-encoded spatial–temporal tensors. Furthermore, while an elevated feature dimensionality enhances the model’s capacity to discern intricate patterns in commuting data, it also introduces challenges such as heightened computational demands and increased risks of overfitting. To counteract these challenges, we incorporated t-distributed stochastic neighbor embedding (t-SNE), a nonlinear unsupervised dimensionality reduction technique (Van der Maaten and Hinton 2008). It preserves the pairwise similarities between data in an optimized lower-dimensional space, striking a delicate balance between model complexity and computational efficiency. Last, acknowledging the diverse semantics inherent in selected metropolitan features, we introduced an adaptive feature normalization strategy. This nuanced approach entails a comprehensive assessment of each feature’s nature, measurement units, and contextual relevance. We then applied customized normalization techniques to quantitative measures and categorical features based on this assessment, facilitating meaningful comparisons and interactions between features during the model training phase.

Commuting graph construction

To ensure a comprehensive representation of diverse facets of the commuting landscape, we applied a systematic methodology of graph construction that integrates key aspects of commuting patterns and metropolitan information. Three categories of undirected graphs are proposed in this study: a commuting distance graph, a spatial interaction graph, and TCPGs.

Commuting distance graph

Distance plays a crucial role in the forecasting of metropolitan commuting generation. Guided by the first law of geography, which posits that adjacent regions tend to have similarities in geographical attributes, including geostatistical features in the metropolitan area, we delved into the impact of distance on commuting generations. In our exploration, we constructed a distance graph using the reciprocal of the spherical distance between the geographic coordinates of the region’s center. The distance values, measured in kilometers, serve to quantify the geographical weight between regions. The weights increase with shorter distances, reflecting the heightened geographical significance of closer spatial relationships.

$$A_{D} = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} 0 & {\frac{1}{{d_{12} }}} \\ {\frac{1}{{d_{21} }}} & 0 \\ \vdots & \vdots \\ \end{array} \begin{array}{*{20}c} \cdots & {\frac{1}{{d_{1N} }}} \\ \ddots & {\frac{1}{{d_{2N} }}} \\ 0 & \vdots \\ \end{array} } \\ {\begin{array}{*{20}c} {\frac{1}{{d_{N1} }} } & {\frac{1}{{d_{N2} }}} \\ \end{array} \begin{array}{*{20}c} { \cdots } & 0 \\ \end{array} } \\ \end{array} } \right]$$
(1)

Spatial interaction graph

Spatial interaction embodied as the flow of population movement adeptly captures complementary spatial–temporal dependencies across various ranges. Regions engaged in spatial interaction are categorized into origins and destinations representing nodes of outbound and inbound commuting flows, respectively. The spatial interaction intensity is measured by the periodical volume of commuting flows between regions (Liu et al. 2015). Expanding on this concept, we constructed a graph incorporating spatial interaction intensity and shed light on how metropolitan spatial interaction patterns enhance the forecasting of commuting generations.

$$A_{I} = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} 0 & {f_{12} } \\ {f_{21} } & 0 \\ \vdots & \vdots \\ \end{array} \begin{array}{*{20}c} \cdots & {f_{1N} } \\ \ddots & {f_{2N} } \\ 0 & \vdots \\ \end{array} } \\ {\begin{array}{*{20}c} {f_{N1} } & {f_{N2} } \\ \end{array} \begin{array}{*{20}c} { \cdots } & 0 \\ \end{array} } \\ \end{array} } \right]$$
(2)

Topological commuter portrait graph

The individual portraits of commuters help to characterize their socioeconomic roles as metropolitan residents and employees. The topological structure of these commuter portraits reflects the spatial correlation between regions from the perspective of the commuting participants and offers innovative indicators for forecasting commuting generation. We constructed TCPGs based on the three key commuter attributes of age, gender, and income. Additionally, we established a baseline scenario without any specific commuter portraits. For TCPG-Age, TCPG-Gender, and TCPG-Base, we applied the linear correlation method of the Pearson correlation. In the case of TCPG-Income, we employed the monotonic correlation method of the Spearman correlation. Using correlation coefficients as edge weights in TCPGs facilitates a direct representation of commuter-related connectivity between regional nodes. Compared to aggregating commuter portraits into node attributes, this method provides a more robust extraction of spatial dependence by reducing information loss during node-to-node information propagation.

For TCPG-Base, we represented the Pearson correlation coefficient between each pair of regions (i, j) as \(\rho_{ij}\). The commuting generation in region i and region j at time slot t, calculated as the sum of commuting attraction and commuting production, is symbolized as \(c_{t}^{i}\) and \(c_{t}^{j}\), respectively.

$$c_{i} = \frac{1}{T}\mathop \sum \limits_{t = 1}^{T} c_{t}^{i} = \frac{1}{T}\mathop \sum \limits_{t = 1}^{T} \left( {a_{t}^{i} + p_{t}^{i} } \right)$$
(3)
$$\rho_{ij} = \frac{{\mathop \sum \nolimits_{t = 1}^{T} \left( {c_{t}^{i} - c_{i} } \right)\left( {c_{t}^{j} - c_{j} } \right)}}{{\sqrt {\mathop \sum \nolimits_{t = 1}^{T} \left( {c_{t}^{i} - c_{i} } \right)^{2} } \sqrt {\mathop \sum \nolimits_{t = 1}^{T} \left( {c_{t}^{j} - c_{j} } \right)^{2} } }}$$
(4)
$$A_{C} \left( {Base} \right) = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} 1 & {\rho_{12} } \\ {\rho_{21} } & 1 \\ \vdots & \vdots \\ \end{array} \begin{array}{*{20}c} \cdots & {\rho_{1N} } \\ \ddots & {\rho_{2N} } \\ 1 & \vdots \\ \end{array} } \\ {\begin{array}{*{20}c} {\rho_{N1} } & {\rho_{N2} } \\ \end{array} \begin{array}{*{20}c} { \cdots } & 1 \\ \end{array} } \\ \end{array} } \right]$$
(5)

For TCPG-Age, commuters are stratified into three age cohorts: youth, middle age, and old age. We calculated the commuting generation across all regions at each time slot for these specific age cohorts. We deduced the Pearson correlation coefficients \(\rho_{ij}\) for each age category yielding three correlation matrices denoted as \(A_{C} \left( {youth} \right)\), \(A_{C} \left( {middle} \right)\), and \(A_{C} \left( {old} \right)\). To preserve the directional information inherent in Pearson correlation, the synthesis of TCPG-Age entails the Hadamard multiplication of these distinct age correlation matrices.

$${\text{A}}_{C} \left( {Age} \right)_{ij} = [{\text{A}}_{C} \left( {youth} \right) \circ {\text{A}}_{C} \left( {middle} \right) \circ {\text{A}}_{C} \left( {old} \right)]_{ij}$$
(6)

For TCPG-Gender, commuters are grouped into two distinct categories: male and female. We tailored the calculation of commuting generation for all regions at each time slot to these gender distinctions. We independently computed the Pearson correlation coefficients \(\rho_{ij}\) for each gender group, resulting in two correlation matrices denoted as \(A_{C} \left( {male} \right)\) and \(A_{C} \left( {female} \right)\). Subsequently, we formulated the TCPG-Gender by the Hadamard multiplication of gender-specific correlation matrices.

$${\text{A}}_{C} \left( {Gender} \right)_{ij} = [{\text{A}}_{C} \left( {male} \right) \circ {\text{A}}_{C} \left( {female} \right)]_{ij}$$
(7)

For TCPG-Income, we categorized the income of commuters into K ordinal levels. The commuting generation for a specific region n at time slot t, and income level k is represented as \((c_{t}^{n} )^{k} = (a_{t}^{n} )^{k} + (p_{t}^{n} )^{k}\). The income rank of region n at time slot t is determined by selecting the income level with the largest commuting generation denoted as \(r_{n} = {\text{max}}\{ (c_{t}^{n} )^{1} ,(c_{t}^{n} )^{2} , \ldots ,(c_{t}^{n} )^{K}\)}. We computed the Spearman correlation coefficient \(\sigma_{ij}\) between each pair of regions (i, j) based on the difference of income ranks \(r_{i}\) and \(r_{j}\).

$$\sigma_{ij} = 1 - \frac{{6\mathop \sum \nolimits_{t = 1}^{T} \left( {r_{i} - r_{j} } \right)^{2} }}{{T\left( {T^{2} - 1} \right)}}$$
(8)
$$A_{C} \left( {Income} \right) = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} 1 & {\sigma_{12} } \\ {\sigma_{21} } & 1 \\ \vdots & \vdots \\ \end{array} \begin{array}{*{20}c} \cdots & {\sigma_{1N} } \\ \ddots & {\sigma_{2N} } \\ 1 & \vdots \\ \end{array} } \\ {\begin{array}{*{20}c} {\sigma_{N1} } & {\sigma_{N2} } \\ \end{array} \begin{array}{*{20}c} { \cdots } & 1 \\ \end{array} } \\ \end{array} } \right]$$
(9)

MetroGCN architectures

The MetroGCN’s architecture shown in Fig. 2 is intricately designed for commuting forecasts, unfolding through an assembly of spatial–temporal blocks augmented with cutting-edge attention mechanisms. These blocks, spanning multiple layers, collectively extract dynamic spatial–temporal correlations in depth. Within each spatial–temporal block, a spatial attention module refines the weights influencing connected nodes through graph convolution by incorporating attentional information. Following that, a temporal attention module is applied to a standard convolution along the time dimension for updating node features with information from neighbor time slots. This dual attention framework allows MetroGCN to selectively emphasize the most relevant spatial and temporal features, improving the model’s accuracy and interpretability.

Fig. 2
figure 2

Architecture of MetroGCN with TCPGs

The core of MetroGCN comprises two pivotal spatial–temporal blocks. The first block processes the input feature vector \({\mathcal{X}}\) alongside the commuting distance graph \(G_{D}\) and the spatial interaction graph \(G_{I}\). This foundational block is crucial for capturing featural dependence and transforming complex metropolitan features into graph node attributes. Subsequent progress derives a series of latent variables through the GCN-based spatial–temporal convolution, laying the groundwork for a comprehensive analysis of commuting patterns. The second block then receives one of the topological commuter portrait graphs \(G_{C}\) aligned over a pre-defined time sequence in conjunction with the derived latent variables. This block focuses on fine-tuning the approximation of the forecasting target, mirroring the attention-driven convolution in the previous block. To maintain the consistency of the output’s shape with the initial commuting generation, MetroGCN culminates in a fully-connected layer that translates the high-level representations learned by the model into actionable commuting forecasts. Furthermore, a residual connection is strategically integrated into each spatial–temporal block to boost training efficiency and facilitate model convergence.

In each spatial–temporal block, the model processes a batch of inputs, generating the source hidden state \(\overline{H}_{s} \in {\mathbb{R}}^{b \times F \times N \times T}\), alongside the corresponding target hidden state \(H_{s}\) in the same dimensional space. The spatial attention denoted as \(U_{s} \in {\mathbb{R}}^{N \times N}\) is precisely defined by

$$U_{{s_{ij} }} = \frac{{{\text{exp}}\left( {W_{{s_{ij} }} } \right)}}{{\mathop \sum \nolimits_{j = 1}^{N} {\text{exp}}\left( {W_{{s_{ij} }} } \right)}}$$
(10)
$$W_{s} = \left( {\left( {H_{s} \alpha_{1} } \right)\alpha_{2} } \right)\left( {\alpha_{3} \left( {\alpha_{4} \overline{H}_{s} } \right)} \right).$$
(11)

In this context, \(\alpha_{1} ,{ }\alpha_{2} ,{ }\alpha_{3} ,{ }\alpha_{4}\) are learnable parameter matrices for spatial attention. The value of \(U_{{s_{ij} }}\) statistically characterizes the spatial correlation between graph nodes \(v_{i}\) and \(v_{j}\). Similarly, the temporal attention \(U_{t} \in {\mathbb{R}}^{T \times T}\) is defined as

$$U_{{t_{ij} }} = \frac{{{\text{exp}}\left( {W_{{t_{ij} }} } \right)}}{{\mathop \sum \nolimits_{j = 1}^{T} {\text{exp}}\left( {W_{{t_{ij} }} } \right)}}$$
(12)
$$W_{t} = \left( {\left( {H_{t} \beta_{1} } \right)\beta_{2} } \right)\left( {\beta_{3} \left( {\beta_{4} \overline{H}_{t} } \right)} \right).$$
(13)

here \(\beta_{1} ,{ }\beta_{2} ,{ }\beta_{3} ,{ }and \beta_{4}\) are learnable parameter matrices for temporal attention. The value of \(U_{{t_{ij} }}\) signifies the statistical representation of the temporal correlation between time slots \(t_{i}\) and \(t_{j}\). The use of learnable parameter matrices enables the model to dynamically adapt and effectively capture nuanced information of spatial–temporal relationships between nodes and time slots.

To leverage the topological properties of the commuting networks, we implemented graph convolution, substituting the conventional convolution operator with linear operators diagonalizing in the Fourier domain (Henaff et al. 2015). Given a batch of inputs \({\mathcal{X}} \in {\mathbb{R}}^{b \times F \times N \times T}\), we initially applied spatial attention to the input, resulting in \({\mathcal{X}}_{s} = {\mathcal{X}}U_{s}\). We then fed this transformed input into the graph convolution, which is defined as

$${\mathcal{X}}_{g} = g_{\theta } *_{G} {\mathcal{X}}_{s} = g_{\theta } \left( L \right){\mathcal{X}}_{s} .$$
(14)

We applied the graph convolution operator, denoted as \(*_{G}\), in conjunction with a kernel \(g_{\theta }\) to filter the features of graph nodes and derive the underlying graph structures through analysis of the Laplacian matrix \(L\) and its eigenvalues (Bruna et al. 2013). Nevertheless, performing eigenvalue decomposition on large-scale graphs can be computationally intensive. To address this challenge, we employed Chebyshev polynomials (Simonovsky and Komodakis 2017) to approximate the eigenvalue decomposition, thereby enhancing computational efficiency.

$${\mathcal{X}}_{g} = \mathop \sum \limits_{k = 0}^{K - 1} (U_{s} \odot F_{k} \left( {\tilde{L}} \right)){\mathcal{X}}{\Theta }_{k}$$
(15)

In definition (15), \(\tilde{L} = \frac{2}{{\lambda_{max} }}L - I_{N}\), where \(\lambda_{max}\) denotes the largest eigenvalue of \(L\) and \(I_{N}\) is the identity matrix. \({\Theta }\) represents the convolution kernel parameter. With each Chebyshev polynomial item \(F_{k}\), a spatial attention matrix is incorporated using the Hadamard product \(\odot\) to update the spatial relationships between the node and its neighbors from 0 to \(\left( {{\text{K}} - 1} \right){\text{th}}\) order.

As the graph convolution captures spatial correlations in commuting networks, we appended a temporal convolution layer to further capture dynamic correlations between consecutive time slots. The input to this layer is orchestrated using the temporal attention matrix \(U_{t}\).

$${\mathcal{X}}_{t} = LeakyReLU\left( {\phi {*}\left( {{\mathcal{X}}_{g} U_{t} } \right)} \right)$$
(16)

The LeakyReLU is applied as the activation function where \(\phi\) represents the parameters of the temporal convolution kernel and \({*}\) denotes the standard convolutional operation.

Experiments

The experiments' focal point is to validate the effectiveness of our model MetroGCN for forecasting commuting generation within the metropolitan area. By focusing on the complexities of multi-centric commuting patterns, we aim to assess the model’s accuracy and applicability in a real-world setting that exemplifies the challenges of a dynamic metropolitan region.

Study area and datasets

The study area is the Shenzhen Metropolitan Area (SZMA) in Fig. 3, comprising four key cities in Guangdong Province, China. Shenzhen serves as the central hub, complemented by three strategic sub-centers: Guangzhou, Huizhou, and Dongguan. The formation of SZMA is intricately tied to its role in connecting with Hong Kong’s northern metropolis, fostering financial and industrial cooperation among manufacturing clusters, sci-tech innovation platforms, and modern service centers. The urban plan for SZMA emphasizes robust transportation coordination aiming to alleviate pressure on transport hubs in the central areas of each city. The total area of SZMA spans approximately 23,125 km2 and accommodated a population of 528.83 million as of 2022. We divided the study area into 11,776 \(\left( {128\; \times \;92} \right)\) regions, each defined by a spatial resolution of 2 km. The adoption of the WGS84 geographic coordinate system and UTM 50N project system ensured precision and consistency in spatial analysis and modeling throughout the study.

Fig. 3
figure 3

Shenzhen metropolitan area (Shenzhen, Guangzhou, Huizhou, and Dongguan)

We incorporated a diverse array of multi-modal datasets categorized into five parts: historical commuting data, demographic data, house price data, GDP data, and land use data. The historical commuting dataset, extracted from mobile phone data provided by China Unicom, spans November 2019 and comprises 1855.89 million commuting records. Each record details travel attributes (departure time, arrival time, origin location, destination location, travel distance, and travel purpose) alongside commuter attributes (age, gender, and income). These commuting records are captured on an hourly basis, with their origins and destinations pinpointed in the 2 km × 2 km regions. Users’ travel purposes of the commuting records are derived from categorizing origin and destination locations into residence, employment places, and other visited places. The experts from China Unicom define these place categories through data mining and spatiotemporal statistical methods. A trip is identified as commuting if the origin or the destination is categorized as a place of the user’s residence and employment. Metropolitan indicators associated with each region serve as local features for the corresponding commuting origins and destinations. Figure 4 illustrates the spatial distribution of average daily commuting generation in November 2019, and Fig. 5 depicts the daily spatial interaction intensity between the regions with different land use types. The demographic data, sharing the same spatial resolution as the commuting data, are also sourced from China Unicom’s mobile phone data. It includes daily population statistics categorized across seven occupational groups, five educational levels, and three dwelling types. House price and GDP data for 2019 are collected from governmental public statistics specific to the urban districts. Land use data derived from the EULUC-China datasets (Gong et al. 2020) identifies essential land use categories for urban parcels bounded by road networks in 2018.

Fig. 4
figure 4

Spatial distribution of average daily commuting generation of SZMA in November 2019

Fig. 5
figure 5

Daily spatial interaction intensity between the regions with different land use types in SZMA

Baseline models for comparison

To rigorously assess the effectiveness of MetroGCN in commuting forecasts, we conducted a comprehensive comparison with a set of established models spanning both traditional and cutting-edge approaches for traffic prediction. VAR and LSTM, representing classical time-series prediction models, serve as benchmarks in temporal forecasting (Hochreiter and Schmidhuber 1997; Zivot and Wang 2006). STGCN leverages optimized graph convolutions to capture dependencies in spatial and temporal dimensions (Yu et al. \(T_{s}\) set to 2, and the order of the Chebyshev polynomial K set to 3. Furthermore, to evaluate the model performance comprehensively, we adopted three established metrics: mean absolute error (MAE), root mean squared error (RMSE), and common part of commuters (CPC) (Lenormand et al. 2012). MAE and RMSE typically assess prediction errors in regression problems with lower values indicating higher accuracy. The CPC measures common agreement between predicted and true variables approaching the value of one when they are identical.

Analysis and results

Performance analysis across model structures

Table 1 reports the average performance of the models over 10 independent experiments. The results underscore the superiority of models incorporating the GCN structure over those lacking graph representations of commuting topologies. Traditional models such as VAR and LSTM are incompetent at processing the nonlinear temporal dependencies and dynamic spatial interactions. Our proposed model, MetroGCN, with the lowest MAE and RMSE as well as the highest CPC, achieves the best performance in forecasting tasks for commuting attraction and commuting production. The model ASTGCN, adopting the attention mechanism akin to MetroGCN, secures the second position. This highlights the importance of attention mechanisms in reducing the forecasting error of commuting generation. The exceptional aptitude of MetroGCN to perform multi-graph convolutions leads to an average improvement of 29.64% across three metrics compared to ASTGCN. Also, MetroGCN excels DGCNN, ranked third, by an average improvement of 33.19% over the evaluation metrics. Despite DGCNN’s demonstrating proficiency in learning the dynamics of spatial dependencies, it falls short compared to MetroGCN in extracting hidden structural features within commuting patterns. Notably, MetroGCN also surpasses diffusion-based GWN, recurrent-based AGCRN, and ChebNet-based STGCN. In summary, the multi-graph structure of MetroGCN proves beneficial in capturing diverse spatial dependencies, including both proximal and distant relationships, as well as global connections among regions sharing semantic similarities of commuting patterns. With its robust feature fusion capabilities and efficient multi-graph convolution technique, MetroGCN markedly improves the accuracy of commuting generation forecasts. It is pertinent to highlight that the values of MAE and RMSE in traffic forecasts are inherently sensitive to the volume of actual commuting generation. The substantial commuting generation in the SZMA amplifies the forecasting disparities between MetroGCN and benchmarks, demonstrating MetroGCN’s adeptness at managing varying scales of commuting data.

Table 1 Model performance of different model structures

Performance analysis across graph embeddings

The previous section described the superiority of the MetroGCN with TCPG-Base over other state-of-the-art models. To further explore how different TCPGs in the model impact the forecasting of commuting generation, this section analyzes the model performance from three perspectives: the evaluation metrics, the model stability, and the spatial–temporal distribution of forecasting results.

The evaluation metrics presented in Table 2 highlight that the MetroGCN model with TCPG-Age attains superior performance in forecasting both commuting attraction and production. Across the average performance of 10 independent experiments, TCPG-Age consistently outperforms TCPG-Base by an average of 25.24% over three metrics. Additionally, the model incorporating TCPG-Income shows a notable 16.45% improvement over TCPG-Base, whereas TCPG-Gender demonstrates a 7.4% enhancement. These results underscore the efficacy of capturing spatial–temporal correlations through commuter portrait graph embeddings, shedding light on the significant variations in commuting generation as influenced by age, gender, and income. This supports the assertion that disparities in mobility and transport demands predominantly stem from socioeconomic differences (Rodrigue 2020). Information derived from the commuters’ age and income proves more impactful for advancing the accuracy of commuting generation forecasts than do gender characteristics. To investigate the combined impact of three commuter portraits on predicting commuting generation, we devised a scenario wherein three commuter portrait graphs (TCPG-Age, TCPG-Gender, TCPG-Income, collectively referred to as TCPG-AGI) were concurrently employed. The experimental methodology for this scenario adheres to the framework delineated in “Experiments” Section. Our findings indicate that integrating the three commuter portrait graphs with the commuting distance and spatial interaction graphs significantly increased the model's complexity. This was evident from the exponential growth in parameters and the considerable rise in memory usage, ultimately impeding the model’s convergence. Acknowledging the well-established importance of commuting distance and spatial interaction in the literature for commuting forecasts (Zhou and Tang, 2023), we focused on the less-explored aspects of commuter portraits. Thus, to address the convergence challenge posed by multi-graph convolutions, we strategically exclude the commuting distance graph and spatial interaction graph from the TCPG-AGI scenario. Consequently, the performance of TCPG-AGI was not as high as hypothesized; it was 2.82% lower than the performance of TCPG-Base. This underscores the irreplaceable significance of commuting distance and spatial interaction in commuting forecasting.

Table 2 Model performance of different topological commuter portrait graphs

Model stability is a crucial indicator for assessing performance consistency. Figure 6 illustrates the performance variability across 10 experiments employing different TCPGs. In comparison to the baseline TCPG-Base, models incorporating TCPG-Age, TCPG-Gender, and TCPG-Income demonstrate significant stability improvements. Notably, the TCPG-Age model exhibits the least variability of MAE, RMSE, and CPC across all experiments underscoring its superior stability over models employing other TCPGs. The performance of the TCPG-Income model demonstrates greater stability than that of the TCPG-Gender model. However, the stability of the TCPG-AGI model is inferior to that of the TCPG-Base model. We observed consistent results in the evaluation of model stability for forecasting commuting attraction and production.

Fig. 6
figure 6

Model stability based on different topological commuter portrait graphs

Figures 7 and 8 illustrate the spatial divergence in model performance when forecasting commuting attraction and commuting production with different TCPGs. We calculated the reduction in RMSE compared to the model with TCPG-Base for regions within the Shenzhen metropolitan area using workdays from the testing dataset. Our analysis over these days has demonstrated MetroGCN's consistent performance. Specifically, we spotlight November 8, 2019, as an example. We considered commuting patterns during morning rush hours (7–9 am) and evening rush hours (6–8 pm). In Fig. 7, dedicated to commuting attraction forecasting, the model integrating TCPG-Age achieves the most notable reduction (30–40%) trailed by TCPG-Income (20–30%) and TCPG-Gender (10–20%). The regions experiencing RMSE reduction expand significantly during the evening rush coinciding with heightened commuting attraction. During morning rush hours, elevated RMSE reductions are evident in central business districts, whereas the evening rush witnesses larger reductions in suburban areas proximal to city junctions. In Fig. 8 focusing on commuting production forecasting, the TCPG-Age model demonstrates prevalent regions with substantial RMSE reductions (30–40%), outshining TCPG-Gender and TCPG-Income, which predominantly register RMSE reductions within the 1–20% range. Significantly, TCPG-Income surpasses TCPG-Gender in terms of the overall area extent with RMSE reduction. Unlike the patterns in commuting attraction forecasts, larger RMSE reductions manifest in suburban regions during morning rush hours, whereas central business districts exhibit more pronounced RMSE reductions during the evening.

Fig. 7
figure 7

Row 1: the true value of the commuting attraction in SZMA on a sample work day. Rows 2–4: the spatial distribution of RMSE reduction of the models with TCPGs compared to the TCPG-Base model

Fig. 8
figure 8

Row 1: the true value of the commuting production in SZMA on a sample work day. Rows 2–4: the spatial distribution of RMSE reduction of the models with TCPGs compared to the TCPG-Base model

A plausible interpretation of these findings lies in the fact that the metropolitan spatial structure resembles a hierarchical polycentric urban system. Suburban regions predominantly comprise residential zones with comparatively lower living costs and attracting larger populations (Acheampong 2020). These regions function as the origin of morning commuting and the destination of evening commuting. The recurrent movements of commuters indicate robust spatial dependencies among residential regions (Hincks and Wong 2010), inducing significant RMSE reductions in predicting morning commuting production and evening commuting attraction. Conversely, the central business districts providing workplaces operate as the destination of morning commuting and the origin in the evening. The frequent commuting activities signify strong spatial correlations among working regions (Muñiz and Garcia-López 2019), thereby leading to substantial RMSE reductions in the forecasts of morning commuting attraction and evening commuting production.

Figure 9 depicts the temporal variation in model performance when forecasting the commuting generation with different TCPGs. Using a spatial unit within the Shenzhen metropolitan area as an illustrative example, we selected November 8, 2019, as a representative workday in the testing dataset. Generally, MetroGCN exhibits superior performance during morning and evening rush hours compared to other periods. Moreover, the model’s performance tends to show more pronounced improvements during the daytime than during the nighttime. In terms of forecasting temporal commuting attraction and production, the TCPG-Age model demonstrates closer proximity between predicted and true values compared to TCPG-Gender and TCPG-Income. Specifically, the TCPG-Income model outperforms the TCPG-Gender model in capturing temporal changes in commuting generation.

Fig. 9
figure 9

The temporal prediction and true values of a sample spatial unit in SZMA on a sample workday. Each row represents the result of the model with different topological commuter portrait graphs

Performance analysis across metropolitan features

In investigating the contributions of various features to the MetroGCN model for commuting generation forecasting, we conducted an ablation analysis across different feature groups. Table 3 summarizes the average model performance over 10 experiments with distinct feature groups systematically ablated. For commuting attraction forecasts, the results highlight the greater impact of socioeconomic features than of demographic features. Ablating housing price and land use features from MetroGCN resulted in RMSE increases of 29.19% and 22.13%, respectively, surpassing the impact of ablated GDP features. The removal of demographic features led to an average RMSE increase of 10.47%. This underscores the significance of considering housing price and land use as crucial features in commuting attraction forecasting. Conversely, for commuting production forecasting, demographic features influence the model performance more than socioeconomic features. Ablating the population of diverse occupations and educational levels from MetroGCN resulted in RMSE increases of 31.98% and 27.06%, respectively, surpassing the impact of the ablated population by different dwelling types. The ablation of socioeconomic features led to an average RMSE increase of 13.77%. These findings emphasize the importance of demographic structures related to occupation and educational level as vital features in commuting production forecasting.

Table 3 Model performance of different metropolitan features

To ascertain the statistical significance of variations in model performance attributed to metropolitan features, we employed a T-test comparing MetroGCN with models featuring ablations. Table 4 presents the T-test results encompassing T-values and significance levels. Negative T-values indicate a lower mean performance metric when the corresponding feature is included. In contrast, positive T-values signify a higher mean performance metric with the inclusion of the feature. Remarkably, the T-values for MAE and RMSE are consistently negative, whereas the T-values for the CPC are positive, indicating a pronounced effectiveness of metropolitan features in reducing forecasting errors. At a significance level of 0.05, we deemed the observed differences in model performance resulting from feature ablation statistically significant. Notably, the T-value magnitude is most significant for housing prices in commuting attraction forecasts; similarly, in commuting production forecasts, the demographic occupation feature exhibits the highest T-value magnitude, emphasizing their critical roles in forecasting commuting attraction and production.

Table 4 T-test result between MetroGCN and its feature components

Conclusion

This paper delves into the emerging challenge of accurately forecasting commuting generation in metropolitan areas, presenting a critical aspect of deciphering the intricate relationship between routine human mobility and the evolution of urban organization. The insights derived from such forecasting both contribute to a comprehensive understanding of metropolitan commuting and offer valuable guidance to policymakers in sha** efficient transportation systems and configuring urban infrastructure. Commuting forecasts pose greater challenges than do general traffic predictions, given the complexity of commuting activities and the diversity of commuter attributes. In response to these challenges, we propose a metropolis-informed graph convolutional network (MetroGCN) that surpasses state-of-the-art methods in forecasting commuting generation. MetroGCN’s innovation lies in its multi-faceted approach. First, its use of a multi-graph representation for diverse commuter portraits enables the model to discern semantic spatial correlations from individual characteristics. Second, the incorporation of new dimensions, including temporal demographic structures, enhances the model’s adaptability to the commuting dynamics. Third, the scalable spatial–temporal attention blocks endow the model with robust generalization capabilities, which are particularly crucial in the context of large commuting networks.

Beyond MetroGCN’s technical prowess, extensive experiments on Shenzhen metropolitan area datasets yielded noteworthy theoretical insights. The model’s superior performance with TCPG-age indicates the magnitude of incorporating commuter age in spatial interaction modeling for commuting forecasts with TCPG-Income and TCPG-Gender following suit. In addition, the T-test outcomes emphasize the significant influence of factors like housing prices and land use distribution on predicting commuting attraction, highlighting the pivotal role of demographic elements such as occupation and educational level in forecasting commuting production. These findings provide valuable guidance for effective feature organization in commuting generation forecasts. Nevertheless, it is worth noting that the computational demands of many GCN techniques, including our method, can pose challenges in real-world scenarios because of their potential overparameterization. Future research endeavors may explore strategies to streamline the architecture of MetroGCN, enhancing its computational efficiency and expanding its applicability to a broader range of traffic scenarios in metropolitan areas.