Multivariate long-time series traffic passenger flow prediction using causal convolutional sparse self-attention MTS-Informer

Liu, Miaonan; Wang, Wei; Hu, **anhui; Fu, Yunlai; Xu, Fu**; Miao, **nying

doi:10.1007/s00521-023-09003-z

Multivariate long-time series traffic passenger flow prediction using causal convolutional sparse self-attention MTS-Informer

Original Article
Published: 23 September 2023

Volume 35, pages 24207–24223, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Miaonan Liu^1,2,
Wei Wang ORCID: orcid.org/0000-0001-8741-7180^1,2,
**anhui Hu^1,2,
Yunlai Fu^1,2,
Fu** Xu¹ &
…
**nying Miao¹

486 Accesses
2 Citations
Explore all metrics

Abstract

As an important part of the operation preparation process of the intelligent transportation system, the passenger flow distribution law and forecast can guide the urban rail transit to formulate a reasonable operation scheduling plan. Due to the complexity, multi-variables, and instability of traffic passenger flow data, accurate passenger flow prediction takes a lot of work. Based on a convolutional neural network, a causal convolution self-attention traffic passenger flow prediction model MTS-Informer framework is proposed. This method follows the changing law of auxiliary variables, adopts the stabilization method to reduce the instability of the original sequence, and uses the causal convolution feature to improve the ability of the model’s self-attention mechanism to extract local information from the input sequence. The weakening effect of the self-attention mechanism ensures that it can learn similarly to the differential features in the original sequence data. In addition, the stationarity detection of the original sequence data is added. The experimental results show that the fitting degree of the sample data is significantly improved, and the standard error decreases between 10 and 40%, which verifies the effectiveness of the proposed modeling technique. It has higher prediction accuracy and operating efficiency and can provide a basis for urban traffic passenger flow prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Thailand)

Instant access to the full article PDF.

Institutional subscriptions

A time-dependent attention convolutional LSTM method for traffic flow prediction

Article 01 April 2022

RSAB-ConvGRU: A hybrid deep-learning method for traffic flow prediction

Article 03 August 2023

Attention based convolutional networks for traffic flow prediction

Article 08 June 2023

Data Availability

The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

References

Ismail AA, Gunady M, Corrada Bravo H et al (2020) Benchmarking deep learning interpretability in time series predictions[J]. Adv Neural Inf Process Syst 33:6441–6452
Google Scholar
Fatima S, Uddin M (2022) On the forecasting of multivariate financial time series using hybridization of DCC-GARCH model and multivariate ANNs. Neural Comput Appl 34:21911–21925
Google Scholar
Bitencourt HV, Orang O, de Souza LAF et al (2023) An embedding-based non-stationary fuzzy time series method for multiple output high-dimensional multivariate time series forecasting in IoT applications. Neural Comput Appl 35:9407–9420
Google Scholar
McLeod AI, Li WK (1983) Diagnostic checking ARMA time series models using squared-residual autocorrelations[J]. J Time Ser Anal 4(4):269–273
MathSciNet MATH Google Scholar
Su J, Wang S, Huang F (2020) ARMA nets: expanding receptive field for dense prediction[J]. Adv Neural Inf Process Syst 33:17696–17707
Google Scholar
Piccolo D (1990) A distance measure for classifying ARIMA models[J]. J Time Ser Anal 11(2):153–164
MathSciNet MATH Google Scholar
Benvenuto D, Giovanetti M, Vassallo L et al (2020) Application of the ARIMA model on the COVID-2019 epidemic dataset[J]. Data Brief 29:105340
Google Scholar
Song YY, Ying LU (2015) Decision tree methods: applications for classification and prediction[J]. Shanghai Arch Psychiatry 27(2):130
Google Scholar
Quinlan JR (1986) Induction of decision trees[J]. Mach Learn 1:81–106
Google Scholar
Castro-Neto M, Jeong YS, Jeong MK et al (2009) Online-SVR for short-term traffic flow prediction under typical and atypical traffic conditions[J]. Expert Syst Appl 36(3):6164–6173
Google Scholar
Chen Y, Xu P, Chu Y et al (2017) Short-term electrical load forecasting using the Support Vector Regression (SVR) model to calculate the demand response baseline for office buildings[J]. Appl Energy 195:659–670
Google Scholar
Guo L, Fang W, Zhao Q et al (2021) The hybrid PROPHET-SVR approach for forecasting product time series demand with seasonality[J]. Comput Ind Eng 161:107598
Google Scholar
Zhang J, Chen F, Cui Z et al (2020) Deep learning architecture for short-term passenger flow forecasting in urban rail transit[J]. IEEE Trans Intell Transp Syst 22(11):7004–7014
Google Scholar
Li P, Wang S, Zhao H et al (2023) IG-Net: an interaction graph network model for metro passenger flow forecasting[J]. IEEE Trans Intell Transp Syst 24(4):4147–57
Google Scholar
Chun-Hui Z, Song-Rui SY (2011) Kalman filter-based short-term passenger flow forecasting on bus stop[J]. J Transp Syst Eng Inf Technol 11(4):154
Google Scholar
Wu P, Zhao H (2011) Some analysis and research of the AdaBoost algorithm[C]. In: Intelligent computing and information science: international conference, ICICIS (2011) Chongqing, China, January 8–9. Proceedings, Part I. Springer, Berlin Heidelberg 2011, pp. 1–5
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system[C]. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. pp. 785-794
Speiser JL, Miller ME, Tooze J et al (2019) A comparison of random forest variable selection methods for classification prediction modeling[J]. Expert Syst Appl 134:93–101
Google Scholar
Zhang D, Zhang D (2019) Wavelet transform[J]. Fundamentals of Image Data Mining: Analysis, Features, Classification and Retrieval, pp. 35-44
Barbosh M, Singh P, Sadhu A (2020) Empirical mode decomposition and its variants: a review with applications in structural health monitoring[J]. Smart Mater Struct 29(9):093001
Google Scholar
Ng WT, Siu K, Cheung AC, et al (2022) Expressing multivariate time series as graphs with time series attention transformer[J]. ar**v preprint ar**v:2208.09300
Zhao Z, Chen W, Wu X et al (2017) LSTM network: a deep learning approach for short-term traffic forecast[J]. IET Intel Transport Syst 11(2):68–75
Google Scholar
Yang D, Gao X, Kong L, Pang Y, Zhou B (2020) An event-driven convolutional neural architecture for non-intrusive load monitoring of residential appliance. IEEE Trans Consum Electron 66(2):173–182
Google Scholar
Banerjee I, Ling Y, Chen MC et al (2019) Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification[J]. Artif Intell Med 97:79–88
Google Scholar
Rathipriya R, Abdul Rahman AA, Dhamodharavadhani S et al (2023) Demand forecasting model for time-series pharmaceutical data using shallow and deep neural network model. Neural Comput Appl 35:1945–1957
Google Scholar
Gaoshen L, Ling P, **ang L, Tong WU (2019) Study on short-term traffic forecast of Urban Bus stations based on LSTM [J]. J Highway Transp Res Develop 36(02):128–135
Google Scholar
Li L, **ngzhi P, Xuemei LEI (2022) Temporal convolution attention network for remaining useful life estimation [J]. Comput Integ Manufact Syst 28(08):2375–2386
Google Scholar
Zhang L. (2019) Metro passenger flow forecasting systems based on deep neural networks [D]. Bei**g Jiaotong University
Lei W, Chuan L, Dandan P, Yiwei L (2021) Design of subway passenger flow prediction algorithm based on improved convolutional neural network [J]. Modern Electron Techn 44(24):87–91
Google Scholar
Hewage P, Behera A, Trovati M et al (2020) Temporal convolutional neural (TCN) network for an effective weather forecasting using time-series data from the local weather station[J]. Soft Comput 24:16453–16482
Google Scholar
Hassantabar S et al (2021) CovidDeep: SARS-CoV-2/COVID-19 test based on wearable medical sensors and efficient neural networks. IEEE Trans Consum Electron 67(4):244–256
Google Scholar
Lai G, Chang W C, Yang Y, et al (2018) Modeling long-and short-term temporal patterns with deep neural networks[C]. In: The 41st international ACM SIGIR conference on research and development in information retrieval. pp. 95-104
Salinas D, Flunkert V, Gasthaus J et al (2020) DeepAR: Probabilistic forecasting with autoregressive recurrent networks[J]. Int J Forecast 36(3):1181–1191
Google Scholar
Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need[J]. Advances in Neural Information Processing Systems, 30
Shen L, Wang Y (2022) TCCT: Tightly-coupled convolutional transformer on time series forecasting[J]. Neurocomputing 480:131–145
Google Scholar
Li B, Cui W, Zhang L et al (2023) DifFormer: multi-resolutional differencing transformer with dynamic ranging for time series analysis[J]. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2023.3293516
Article Google Scholar
Gong M, Zhao Y, Sun J et al (2022) Load forecasting of district heating system based on Informer[J]. Energy 253:124179
Google Scholar
Wu Y, Lian C, Zeng Z et al (2022) An aggregated convolutional transformer based on slices and channels for multivariate time series classification[J]. IEEE Trans Emerg Topics Comput Intell 7(3):3768–779
Google Scholar
Zhou H, Zhang S, Peng J, et al (2021) Informer: beyond efficient transformer for long sequence time-series forecasting[C]. In: Proceedings of the AAAI conference on artificial intelligence. 35(12): 11106-11115
Yu B (2020) Veridical data science[C]. In: Proceedings of the 13th international conference on web search and data mining. pp. 4-5
Wang W, Wang D (2020) Prediction of component concentrations in sodium aluminate liquor using stochastic configuration networks[J]. Neural Comput Appl 32(17):13625–13638
Google Scholar
Bolstad BM, Irizarry RA, Astrand M et al (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias[J]. Bioinformatics 19(2):185–193
Google Scholar
Jiang C, Lu Y, Zhong W et al (2021) Deep Bayesian slow feature extraction with application to industrial inferential modeling[J]. IEEE Trans Ind Inf 19(1):40–51
Google Scholar
Tiantian T, Wei ZHOU (2022) Research on commodity sales forecast oriented on deep learning [J]. J Chongqing Univ Technol 36(07):310–316
Google Scholar
Liu Y, Wu H, Wang J, et al (2022) Non-stationary transformers: exploring the stationarity in time series forecasting[C]. In: Advances in Neural Information Processing Systems
Jian-Wei L, Hui-Dan Z, **ong-Lin L, Jun X (2020) Research progress on batch normalization of deep learning and its related algorithms [J]. Acta Autom Sinica 46(06):1090–1120. https://doi.org/10.16383/j.aas.c180564
Article Google Scholar
Dickey DA, Fuller WA (1981) Likelihood ratio statistics for autoregressive time series with a unit root[J]. Econometrica J Econom Soc 49:1057–1072
MathSciNet MATH Google Scholar
Lee D, Schmidt P (1996) On the power of the KPSS test of stationarity against fractionally-integrated alternatives[J]. J Econom 73(1):285–302
MathSciNet MATH Google Scholar
He Yunqiang (2015) Research on corporate governance of Southeast Asian Corporations[D]. Lanzhou University of Finance and Economics
Razali NM, Wah YB (2011) Power comparisons of shapiro-wilk, kolmogorov-smirnov, lilliefors and anderson-darling tests[J]. J Stat Model Anal 2(1):21–33
Google Scholar
Gonzalez-Estrada E, Cosmes W (2019) Shapiro-Wilk test for skew normal distributions based on data transformations[J]. J Stat Comput Simul 89(17):3258–3272
MathSciNet MATH Google Scholar
Hanusz Z, Tarasiska J (2015) Normalization of the Kolmogorov, Smirnov and Shapiro, Wilk tests of normality[J]. Biomet Lett 52(2):85–93
Google Scholar
Gonzalez-Estrada E, Villasenor JA, Acosta-Pech R (2022) Shapiro-Wilk test for multivariate skew-normality[J]. Comput Stat 37(4):1985–2001
MathSciNet MATH Google Scholar
Quraisy A (2020) Normalitas data Menggunakan Uji Kolmogorov-Smirnov dan Saphiro-Wilk: Studi kasus penghasilan orang tua mahasiswa Prodi Pendidikan Matematika Unismuh Makassar[J]. J-HEST J Health Educ Econ Sci Technol 3(1):7–11
Google Scholar

Download references

Acknowledgements

This work is supported by open project of State Key Laboratory of Integrated Automation of Process Industry, Northeastern University (No. 2020052).

Author information

Authors and Affiliations

College of Information Engineering, Dalian Ocean University, Dalian, 116023, China
Miaonan Liu, Wei Wang, **anhui Hu, Yunlai Fu, Fu** Xu & **nying Miao
Key Laboratory of Environment Controlled Aquaculture Ministry of Education, Dalian Ocean University, Dalian, 116023, China
Miaonan Liu, Wei Wang, **anhui Hu & Yunlai Fu

Authors

Miaonan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar
**anhui Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yunlai Fu
View author publications
You can also search for this author in PubMed Google Scholar
Fu** Xu
View author publications
You can also search for this author in PubMed Google Scholar
**nying Miao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Wang.

Ethics declarations

Conflict of interest

No conflict of interest exists in the submission of this manuscript, and manuscript is approved by all authors for publication. I would like to declare on behalf of my co-authors that the work described was original research that has not been published previously, and not under consideration for publication elsewhere, in whole or in part. All the authors listed have approved the manuscript that is enclosed.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

To reveal the causal convolutional deep neural network structure in Sect. 3.2, the built-in detailed description is also introduced:

A causal convolutional network consists of an input, hidden, and output layer, respectively, as shown in Fig. 8. Each layer uses the same type of neurons and uses a non-full connection between each layer, which is realized through a mask.

Supplementary explanation for the sparse probabilistic self-attention algorithm mentioned in Sect. 3.2:

$$\begin{aligned}{} & {} A(q_i,k,v)=\sum _{j}^{L_k}{\frac{k(q_i,k_j)}{\sum _{l}k(q_i,k_i)}}=E_{p(k_j\vert q_i)}[v_j] \end{aligned}$$

(1)

$$\begin{aligned}{} & {} p(k_j,q_i)=\frac{k(p_i,k_j)}{\sum _{l}(q_i,k_i)} \end{aligned}$$

(2)

$$\begin{aligned}{} & {} q(k_j,q_j)=\frac{1}{L_k} \end{aligned}$$

(3)

Among them, let $q_i$, $k_i$ and $v_i$ denote the i-th row in Q, K, and V, respectively, $p(k_j,q_i)$ represents the attention probability distribution of the i-th query to all keys. $q(k_j,q_j)$ represents the uniform distribution of query.

Use the KL divergence formula to calculate the distance between the two distributions of P and Q to measure the sparsity of the query. The discrete KL divergence is defined as follows:

$$\begin{aligned}{} & {} D(P\vert \vert Q)=\sum _{i\in x}P(i)*[\log {(\frac{P(i)}{Q(i)})}] \end{aligned}$$

(4)

$$\begin{aligned}{} & {} D(P\mid \mid Q)=\int _x P(x)*[\log {(\frac{P(i)}{Q(i)})}] dx \end{aligned}$$

(5)

Add the attention rate distribution and balanced distribution of the query to the KL divergence, and finally determine the approximate expression of the sparsity of the query as:

$$\begin{aligned} KL(q\vert \vert p)={max}_j\{q_ik_j^T \cdot d^*\}-\frac{1}{L_k}\sum _{j=1}^{L_k}(q_i k_j^T\cdot d^*) \end{aligned}$$

(6)

In the formula, the first item calculates the inner product of the ith query and all keys and selects the maximum value. Compared with the arithmetic mean of the second item, the more significant the result difference, the greater the difference between p and q. In the set selection interval, select the query with a higher difference ranking, and the sampling parameters determine the size of this interval. Therefore, the time series L uses the sparse probabilistic self-attention mechanism to calculate the similarity between Q and K only requires calculating the $O(L\log L)$ dot product operation. The dot-product manipulation of traditional self-attention means it makes the time complexity and memory usage $O(L^2)$ per layer, and stacking n encoder–decoders for long-term sequence inputs has a total usage of $O(n \cdot L^2)$. The probabilistic sparse self-attention mechanism can achieve $O(L\log L)$ regarding time complexity and memory usage by improving network components through the above methods.

Appendix B

In the experiment, the visualization results of the sequence normal distribution test are supplemented as follows:

The cumulative probability graphic method (Fig. 9) is a scatter diagram drawn according to the cumulative probability of the predictor variable corresponding to the cumulative probability of the specified theoretical distribution. The x-axis is the cumulative probability of the sample and the corresponding y-axis ranges from (0, 1) between. The quantile graphical method (Fig. 10) uses quantiles to express the x-axis as the quantile of the expected overall distribution, and the y-axis as the quantile of the empirical distribution of the predicted sample. We use the above two graphical methods for normality testing, the expected distribution is set to a standard normal distribution, and it is necessary to examine whether the point falls on $y=x$, the intercept of the straight line is the mean, and the slope is the standard deviation. Suppose the overall distribution corresponding to the sample is a normal distribution. In that case, the scatter points corresponding to the samples in the P-P and Q-Q diagram should fall near the $45^{\circ }$ line from the origin.

The optimization operator is particularly critical in improving the self-attention performance of this paper. This group of control experiments can intuitively and clearly express its advantages. Whether to introduce the optimization operator has a particular impact on the self-attention mechanism, and its prediction effect is generally better than that of the basic. The effect of the model has been well proven on various data sets (Fig. 11).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, M., Wang, W., Hu, X. et al. Multivariate long-time series traffic passenger flow prediction using causal convolutional sparse self-attention MTS-Informer. Neural Comput & Applic 35, 24207–24223 (2023). https://doi.org/10.1007/s00521-023-09003-z

Download citation

Received: 27 April 2023
Accepted: 22 August 2023
Published: 23 September 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00521-023-09003-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Thailand)

Instant access to the full article PDF.

Institutional subscriptions

Multivariate long-time series traffic passenger flow prediction using causal convolutional sparse self-attention MTS-Informer

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A time-dependent attention convolutional LSTM method for traffic flow prediction

RSAB-ConvGRU: A hybrid deep-learning method for traffic flow prediction

Attention based convolutional networks for traffic flow prediction

Data Availability

References

Acknowledgements