Abstract
As an important part of the operation preparation process of the intelligent transportation system, the passenger flow distribution law and forecast can guide the urban rail transit to formulate a reasonable operation scheduling plan. Due to the complexity, multi-variables, and instability of traffic passenger flow data, accurate passenger flow prediction takes a lot of work. Based on a convolutional neural network, a causal convolution self-attention traffic passenger flow prediction model MTS-Informer framework is proposed. This method follows the changing law of auxiliary variables, adopts the stabilization method to reduce the instability of the original sequence, and uses the causal convolution feature to improve the ability of the model’s self-attention mechanism to extract local information from the input sequence. The weakening effect of the self-attention mechanism ensures that it can learn similarly to the differential features in the original sequence data. In addition, the stationarity detection of the original sequence data is added. The experimental results show that the fitting degree of the sample data is significantly improved, and the standard error decreases between 10 and 40%, which verifies the effectiveness of the proposed modeling technique. It has higher prediction accuracy and operating efficiency and can provide a basis for urban traffic passenger flow prediction.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-09003-z/MediaObjects/521_2023_9003_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-09003-z/MediaObjects/521_2023_9003_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-09003-z/MediaObjects/521_2023_9003_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-09003-z/MediaObjects/521_2023_9003_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-09003-z/MediaObjects/521_2023_9003_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-09003-z/MediaObjects/521_2023_9003_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-09003-z/MediaObjects/521_2023_9003_Fig7_HTML.png)
Similar content being viewed by others
Data Availability
The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.
References
Ismail AA, Gunady M, Corrada Bravo H et al (2020) Benchmarking deep learning interpretability in time series predictions[J]. Adv Neural Inf Process Syst 33:6441–6452
Fatima S, Uddin M (2022) On the forecasting of multivariate financial time series using hybridization of DCC-GARCH model and multivariate ANNs. Neural Comput Appl 34:21911–21925
Bitencourt HV, Orang O, de Souza LAF et al (2023) An embedding-based non-stationary fuzzy time series method for multiple output high-dimensional multivariate time series forecasting in IoT applications. Neural Comput Appl 35:9407–9420
McLeod AI, Li WK (1983) Diagnostic checking ARMA time series models using squared-residual autocorrelations[J]. J Time Ser Anal 4(4):269–273
Su J, Wang S, Huang F (2020) ARMA nets: expanding receptive field for dense prediction[J]. Adv Neural Inf Process Syst 33:17696–17707
Piccolo D (1990) A distance measure for classifying ARIMA models[J]. J Time Ser Anal 11(2):153–164
Benvenuto D, Giovanetti M, Vassallo L et al (2020) Application of the ARIMA model on the COVID-2019 epidemic dataset[J]. Data Brief 29:105340
Song YY, Ying LU (2015) Decision tree methods: applications for classification and prediction[J]. Shanghai Arch Psychiatry 27(2):130
Quinlan JR (1986) Induction of decision trees[J]. Mach Learn 1:81–106
Castro-Neto M, Jeong YS, Jeong MK et al (2009) Online-SVR for short-term traffic flow prediction under typical and atypical traffic conditions[J]. Expert Syst Appl 36(3):6164–6173
Chen Y, Xu P, Chu Y et al (2017) Short-term electrical load forecasting using the Support Vector Regression (SVR) model to calculate the demand response baseline for office buildings[J]. Appl Energy 195:659–670
Guo L, Fang W, Zhao Q et al (2021) The hybrid PROPHET-SVR approach for forecasting product time series demand with seasonality[J]. Comput Ind Eng 161:107598
Zhang J, Chen F, Cui Z et al (2020) Deep learning architecture for short-term passenger flow forecasting in urban rail transit[J]. IEEE Trans Intell Transp Syst 22(11):7004–7014
Li P, Wang S, Zhao H et al (2023) IG-Net: an interaction graph network model for metro passenger flow forecasting[J]. IEEE Trans Intell Transp Syst 24(4):4147–57
Chun-Hui Z, Song-Rui SY (2011) Kalman filter-based short-term passenger flow forecasting on bus stop[J]. J Transp Syst Eng Inf Technol 11(4):154
Wu P, Zhao H (2011) Some analysis and research of the AdaBoost algorithm[C]. In: Intelligent computing and information science: international conference, ICICIS (2011) Chongqing, China, January 8–9. Proceedings, Part I. Springer, Berlin Heidelberg 2011, pp. 1–5
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system[C]. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. pp. 785-794
Speiser JL, Miller ME, Tooze J et al (2019) A comparison of random forest variable selection methods for classification prediction modeling[J]. Expert Syst Appl 134:93–101
Zhang D, Zhang D (2019) Wavelet transform[J]. Fundamentals of Image Data Mining: Analysis, Features, Classification and Retrieval, pp. 35-44
Barbosh M, Singh P, Sadhu A (2020) Empirical mode decomposition and its variants: a review with applications in structural health monitoring[J]. Smart Mater Struct 29(9):093001
Ng WT, Siu K, Cheung AC, et al (2022) Expressing multivariate time series as graphs with time series attention transformer[J]. ar**v preprint ar**v:2208.09300
Zhao Z, Chen W, Wu X et al (2017) LSTM network: a deep learning approach for short-term traffic forecast[J]. IET Intel Transport Syst 11(2):68–75
Yang D, Gao X, Kong L, Pang Y, Zhou B (2020) An event-driven convolutional neural architecture for non-intrusive load monitoring of residential appliance. IEEE Trans Consum Electron 66(2):173–182
Banerjee I, Ling Y, Chen MC et al (2019) Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification[J]. Artif Intell Med 97:79–88
Rathipriya R, Abdul Rahman AA, Dhamodharavadhani S et al (2023) Demand forecasting model for time-series pharmaceutical data using shallow and deep neural network model. Neural Comput Appl 35:1945–1957
Gaoshen L, Ling P, **ang L, Tong WU (2019) Study on short-term traffic forecast of Urban Bus stations based on LSTM [J]. J Highway Transp Res Develop 36(02):128–135
Li L, **ngzhi P, Xuemei LEI (2022) Temporal convolution attention network for remaining useful life estimation [J]. Comput Integ Manufact Syst 28(08):2375–2386
Zhang L. (2019) Metro passenger flow forecasting systems based on deep neural networks [D]. Bei**g Jiaotong University
Lei W, Chuan L, Dandan P, Yiwei L (2021) Design of subway passenger flow prediction algorithm based on improved convolutional neural network [J]. Modern Electron Techn 44(24):87–91
Hewage P, Behera A, Trovati M et al (2020) Temporal convolutional neural (TCN) network for an effective weather forecasting using time-series data from the local weather station[J]. Soft Comput 24:16453–16482
Hassantabar S et al (2021) CovidDeep: SARS-CoV-2/COVID-19 test based on wearable medical sensors and efficient neural networks. IEEE Trans Consum Electron 67(4):244–256
Lai G, Chang W C, Yang Y, et al (2018) Modeling long-and short-term temporal patterns with deep neural networks[C]. In: The 41st international ACM SIGIR conference on research and development in information retrieval. pp. 95-104
Salinas D, Flunkert V, Gasthaus J et al (2020) DeepAR: Probabilistic forecasting with autoregressive recurrent networks[J]. Int J Forecast 36(3):1181–1191
Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need[J]. Advances in Neural Information Processing Systems, 30
Shen L, Wang Y (2022) TCCT: Tightly-coupled convolutional transformer on time series forecasting[J]. Neurocomputing 480:131–145
Li B, Cui W, Zhang L et al (2023) DifFormer: multi-resolutional differencing transformer with dynamic ranging for time series analysis[J]. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2023.3293516
Gong M, Zhao Y, Sun J et al (2022) Load forecasting of district heating system based on Informer[J]. Energy 253:124179
Wu Y, Lian C, Zeng Z et al (2022) An aggregated convolutional transformer based on slices and channels for multivariate time series classification[J]. IEEE Trans Emerg Topics Comput Intell 7(3):3768–779
Zhou H, Zhang S, Peng J, et al (2021) Informer: beyond efficient transformer for long sequence time-series forecasting[C]. In: Proceedings of the AAAI conference on artificial intelligence. 35(12): 11106-11115
Yu B (2020) Veridical data science[C]. In: Proceedings of the 13th international conference on web search and data mining. pp. 4-5
Wang W, Wang D (2020) Prediction of component concentrations in sodium aluminate liquor using stochastic configuration networks[J]. Neural Comput Appl 32(17):13625–13638
Bolstad BM, Irizarry RA, Astrand M et al (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias[J]. Bioinformatics 19(2):185–193
Jiang C, Lu Y, Zhong W et al (2021) Deep Bayesian slow feature extraction with application to industrial inferential modeling[J]. IEEE Trans Ind Inf 19(1):40–51
Tiantian T, Wei ZHOU (2022) Research on commodity sales forecast oriented on deep learning [J]. J Chongqing Univ Technol 36(07):310–316
Liu Y, Wu H, Wang J, et al (2022) Non-stationary transformers: exploring the stationarity in time series forecasting[C]. In: Advances in Neural Information Processing Systems
Jian-Wei L, Hui-Dan Z, **ong-Lin L, Jun X (2020) Research progress on batch normalization of deep learning and its related algorithms [J]. Acta Autom Sinica 46(06):1090–1120. https://doi.org/10.16383/j.aas.c180564
Dickey DA, Fuller WA (1981) Likelihood ratio statistics for autoregressive time series with a unit root[J]. Econometrica J Econom Soc 49:1057–1072
Lee D, Schmidt P (1996) On the power of the KPSS test of stationarity against fractionally-integrated alternatives[J]. J Econom 73(1):285–302
He Yunqiang (2015) Research on corporate governance of Southeast Asian Corporations[D]. Lanzhou University of Finance and Economics
Razali NM, Wah YB (2011) Power comparisons of shapiro-wilk, kolmogorov-smirnov, lilliefors and anderson-darling tests[J]. J Stat Model Anal 2(1):21–33
Gonzalez-Estrada E, Cosmes W (2019) Shapiro-Wilk test for skew normal distributions based on data transformations[J]. J Stat Comput Simul 89(17):3258–3272
Hanusz Z, Tarasiska J (2015) Normalization of the Kolmogorov, Smirnov and Shapiro, Wilk tests of normality[J]. Biomet Lett 52(2):85–93
Gonzalez-Estrada E, Villasenor JA, Acosta-Pech R (2022) Shapiro-Wilk test for multivariate skew-normality[J]. Comput Stat 37(4):1985–2001
Quraisy A (2020) Normalitas data Menggunakan Uji Kolmogorov-Smirnov dan Saphiro-Wilk: Studi kasus penghasilan orang tua mahasiswa Prodi Pendidikan Matematika Unismuh Makassar[J]. J-HEST J Health Educ Econ Sci Technol 3(1):7–11
Acknowledgements
This work is supported by open project of State Key Laboratory of Integrated Automation of Process Industry, Northeastern University (No. 2020052).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No conflict of interest exists in the submission of this manuscript, and manuscript is approved by all authors for publication. I would like to declare on behalf of my co-authors that the work described was original research that has not been published previously, and not under consideration for publication elsewhere, in whole or in part. All the authors listed have approved the manuscript that is enclosed.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
To reveal the causal convolutional deep neural network structure in Sect. 3.2, the built-in detailed description is also introduced:
A causal convolutional network consists of an input, hidden, and output layer, respectively, as shown in Fig. 8. Each layer uses the same type of neurons and uses a non-full connection between each layer, which is realized through a mask.
The input layer is used to input feature vectors, the hidden layer is used to extract raw data information, and the output layer is responsible for outputting results. The self-attention mechanism of the built-in causal convolutional neural network in this work mainly plays a crucial role in obtaining the local characteristics of the input time series
Supplementary explanation for the sparse probabilistic self-attention algorithm mentioned in Sect. 3.2:
Among them, let \(q_i\), \(k_i\) and \(v_i\) denote the i-th row in Q, K, and V, respectively, \(p(k_j,q_i)\) represents the attention probability distribution of the i-th query to all keys. \(q(k_j,q_j)\) represents the uniform distribution of query.
Use the KL divergence formula to calculate the distance between the two distributions of P and Q to measure the sparsity of the query. The discrete KL divergence is defined as follows:
Add the attention rate distribution and balanced distribution of the query to the KL divergence, and finally determine the approximate expression of the sparsity of the query as:
In the formula, the first item calculates the inner product of the ith query and all keys and selects the maximum value. Compared with the arithmetic mean of the second item, the more significant the result difference, the greater the difference between p and q. In the set selection interval, select the query with a higher difference ranking, and the sampling parameters determine the size of this interval. Therefore, the time series L uses the sparse probabilistic self-attention mechanism to calculate the similarity between Q and K only requires calculating the \(O(L\log L)\) dot product operation. The dot-product manipulation of traditional self-attention means it makes the time complexity and memory usage \(O(L^2)\) per layer, and stacking n encoder–decoders for long-term sequence inputs has a total usage of \(O(n \cdot L^2)\). The probabilistic sparse self-attention mechanism can achieve \(O(L\log L)\) regarding time complexity and memory usage by improving network components through the above methods.
Appendix B
In the experiment, the visualization results of the sequence normal distribution test are supplemented as follows:
The cumulative probability graphic method (Fig. 9) is a scatter diagram drawn according to the cumulative probability of the predictor variable corresponding to the cumulative probability of the specified theoretical distribution. The x-axis is the cumulative probability of the sample and the corresponding y-axis ranges from (0, 1) between. The quantile graphical method (Fig. 10) uses quantiles to express the x-axis as the quantile of the expected overall distribution, and the y-axis as the quantile of the empirical distribution of the predicted sample. We use the above two graphical methods for normality testing, the expected distribution is set to a standard normal distribution, and it is necessary to examine whether the point falls on \(y=x\), the intercept of the straight line is the mean, and the slope is the standard deviation. Suppose the overall distribution corresponding to the sample is a normal distribution. In that case, the scatter points corresponding to the samples in the P-P and Q-Q diagram should fall near the \(45^{\circ }\) line from the origin.
The optimization operator is particularly critical in improving the self-attention performance of this paper. This group of control experiments can intuitively and clearly express its advantages. Whether to introduce the optimization operator has a particular impact on the self-attention mechanism, and its prediction effect is generally better than that of the basic. The effect of the model has been well proven on various data sets (Fig. 11).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, M., Wang, W., Hu, X. et al. Multivariate long-time series traffic passenger flow prediction using causal convolutional sparse self-attention MTS-Informer. Neural Comput & Applic 35, 24207–24223 (2023). https://doi.org/10.1007/s00521-023-09003-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-09003-z