1 Introduction

Time-series prediction has played important roles in the analysis of future markets, meteorology, pandemic, and other industrial/customer applications, in which historical statistics would be regarded as time-series signals to build computational prediction models. Related applications could be categorized into single-step [1], multi-step [2], multi-timescale [3], multivariate [4] prediction, and missing data imputation [5]. Many machine learning techniques have been proposed, including statistical methods [6], artificial neural network [7], support vector machine (SVM) [8], and other approaches.

To further improve the prediction performance of specific tasks, researchers design advanced methods according to features of time-series data. For example, the characteristics of financial data include volatility, irregularity, noise, and inertia, so modified fuzzy systems [9], hybrid neural networks [10, 11], and advanced filters [12, 13] are observed with effectiveness. The wind power data have the inherent nature of intermittence and randomness, so researchers have studied data pre-process techniques [14] and mined spatial correlation to improve prediction accuracy [15].

It could be observed that existing prediction approaches are proposed based on specific application scenarios. According to the “no free lunch” theory [16], a certain approach has satisfactory performance on some time-series scenarios and must be ineffective on other ones. How do we fairly evaluate these approaches for general prediction tasks? A comprehensive set of time-series benchmarks is necessary for covering most data characteristics, and an online competition could be held by inviting volunteer researchers to solve the same benchmarks. The performance of each competing approach could be analyzed, and overall rankings could indicate how to choose an appropriate approach for fitting a certain kind of characteristics. In the M3 competition [17], the organizers summarize several previous competitions, and propose a business-related dataset of 3033 benchmarks, which include micro, industrial, and macroscenarios with different time scale, e.g., yearly, quarterly, and so on. The results of M3 competition illustrate that the time-series prediction algorithms perform variously in different datasets [18]. The M3 competition organizers expect this large-volume dataset analyzable, but neglect the fairness of feature evaluation. In the M4 competition, the organizers continued to increase the volume of dataset [19]. As extensions of M3 and M4, NN3 and NN5 competitions focus on evaluating prediction approaches through single or double timescale datasets [20, 21].

In addition, there are certain proposed datasets to promote the developments of related research techniques [22]. Pandemic prediction, about saving millions of lives, is one of the topic research areas, and various datasets are presented to facilitate researchers [23,23,24,26]. Besides, as energy topics get continuous attention from society, researches on energy demand and consumption are rapidly increasing. Hence, many energy-related datasets are proposed to facilitate researchers to study the improvement of energy efficiency [27,27,28,30]. Apart from the aforementioned areas, many data-driven methods also have related time-series datasets for researchers, e.g., solar flare research [31], plants detection [32], remote sensing [33], and waste stream [34].

However, existing datasets and competitions have only considered large-volume datasets to evaluate the performance of prediction approaches, and few studies have categorized the features of time-series data, which causes the failure of characteristic analysis. In fact, prediction performance has significant variance with respect to data characteristics, especially for deep learning approaches [35]. Neural network-based methods, widely adopted in time-series prediction, are sensitive to data features in low frequency [36]. To further improve the prediction performance of neural networks, the learning ability of high-frequency features should be enhanced [37]. However, for the datasets proposed in time-series competitions, frequency features are seldom considered and categorized. Moreover, for complex time-series prediction, although certain studies decompose the data in frequency domain [38,38,39,41], they ignore the impact of different frequency features to their approaches. Therefore, a new method based on the finite impulse response (FIR) filter is developed to generate a set of 16 benchmarks, which is used in the time-series competition held at the 2022 international conference of neural computing with advanced applications (NCAA2022).

The NCAA2022 dataset, including 16 time-series prediction benchmarks, is proposed to emphasize the impacts of frequency characteristics on different prediction models. In the NCAA2022 competition dataset, 4 original time-series instances are collected, and each problem is transformed into low-pass, high-pass, band-pass, and band-stop benchmarks. A new generation method is proposed based on the adaptive design of FIR filter, and the transform process is simplified by sub-matrices to reserve computation cost.

The rest of this paper is organized as follows: The original time-series instances are introduced in Sect. 2. In Sect. 3, the generation methodology of new benchmarks is proposed. In Sect. 4, the evaluation metric of NCAA2022 dataset is described, and the baseline performance and comparative results are presented. Finally, the paper is concluded in Sect. 5.

2 Original instances

The original instances are required to reflect comprehensive characteristics, so coronavirus (COVID-19) pandemic data, Dow Jones Industrial average indices, and wind speed data, are chosen as original instances, which are available in [42,42,44]. Furthermore, the Lorenz system, a classical benchmark for time-series prediction [45, 46], is also chosen for integrating the chaotic characteristics.

The market prediction is a worldwide hot topic, and many machine-learning approaches are applied to predict the indices. Generally, there is a nonlinear relation between the future and past indices, so the fitting approaches for the nonlinear relation are widely applied [47, 48].

The closing indices of Dow Jones Industrial Average from December 25, 1979, to November 17, 2021, are chosen as the stock raw data [43]. The stock raw data are one-dimension with a 10567 length. Researchers in market prediction are mainly considering historical indices [49, 50], other related factors are neglected in the stock raw instance.

With the increasing concern about greenhouse effects, renewable energy prediction has become the key point to improving the efficiency of power generation. Due to the volatile feature of wind speed, the prediction of wind power is complicated and challenging. Existing approaches could be categorized into physical and statistical models [51]. To handle volatility, neural network and data processing methods are often combined in most studies [52].

The wind speed raw data are congregated by wind speed data collected at two sites, i.e., Ames and Dubuque, from January 1, 2010, to December 28, 2011. The wind data have 2 dimensions, and each dimension has the length of 15000 [44]. Both two sites are located in Iowa, America, and there is a spatial correlation of wind speed between the two sites. Certain advanced machine learning models have taken advantage of the spatial correlation to improve the prediction performance [53], so data from two sites are provided as references in the NCAA2022 competition.

In order to reduce the influence of COVID-19 outbreak, researchers are focusing on prediction and control methods for pandemic spreading. Due to the short duration of COVID-19, there is a limited amount of pandemic-related data. As a result, the prediction approaches need to have the capability of few-shot or transfer learning [54, 55].

The number of daily confirmed cases in the USA, UK, Canada, and Germany from January 22, 2020, to November 20, 2021, are chosen as pandemic raw data [42]. Thus, the COVID-19 data have 4 dimensions, and each dimension has the length of 669. As the confirmed data include certain correlations on geographical information [56], COVID-19 data over 4 countries are given for references in the NCAA2022 competition.

Chaotic systems commonly exist in the real world, so the chaos is considered in time-series prediction approaches [57, 58]. The Lorenz system is chosen as the chaotic raw data, which has nonlinearity and initial condition sensitivity. The canonical form of Lorenz system is expressed as

$$\begin{aligned} \left\{ \begin{aligned} {\dot{z}}_1(t) =&a( z_2(t) - z_1(t) )\\ {\dot{z}}_2(t) =&z_1(t)( b - z_3(t) ) - z_2(t)\\ {\dot{z}}_3(t) =&z_1(t)x_2(t) - cz_3(t),\\ \end{aligned} \right. \end{aligned}$$
(1)

where abc are chaotic parameters and \(z_1(t), z_2(t), z_3(t)\) are three dimensions of Lorenz system at time t. In this paper, the parameters and initial conditions are set as \(a = 10, b = 28, c = 3, z_1(0) = 0, z_2(0) = 1, z_3(0) = 0\). The chaotic raw data have 3 dimensions. Assume the sampling period and duration are 0.01 s and 150 s, respectively, and each dimension has the length of 15000. The chaotic data are simulated based on the Runge–Kutta solutions of Lorenz system [59].

The information of four chosen raw data is shown in Table 1.

Table 1 Information of raw data

3 Methodology of dataset generation


To generate a comprehensive set of benchmarks with respect to different frequency features, an FIR filter-based approach is developed. As illustrated in Fig. 1, the generation process includes an FIR filter and problem setting module. The FIR filter is responsible to transform each raw data into four ones with low-pass, high-pass, band-pass, and band-stop features. In the problem setting module, the transformed data are processed into prediction problems.

Fig. 1
figure 1

Generation process of frequency-based prediction problems

3.1 Data transformation


A transformation is needed to generate required data from stock, COVID-19, wind speed, and chaos data. As shown in Fig. 2, the transform process could be denoted as

$$\begin{aligned} y(t) = f(x), \end{aligned}$$
(2)

where x is the raw data, y(t) is the tth variable in the generated data, \(f(\cdot )\) is the transform relation. To enhance the frequency features of raw instances, the FIR filter-based approach is adopted as the transform relation.

Fig. 2
figure 2

The general data transform form

3.2 FIR filter-based approach

The FIR filter refers to a linear shift-invariant (LSI) system [60], and the discrete input–output relation could be represented as

$$\begin{aligned} y(t) = \alpha [1] x(t) + \alpha [2] x(t-1) + \dots + \alpha [K] x(t - K + 1), \end{aligned}$$
(3)

where K is the order of FIR filter, \(\alpha [k]\) is the kth parameter of filter, y(t) and x(t) are the output variable and raw data at time t, respectively. It is noteworthy that the order of filter K is usually odd to facilitate the design procedure [61].

The frequency characteristics could be designed by setting the filter parameters \(\alpha [k]\), which could be generally computed by the product of impulse response h[k] and window function v[k] as

$$\begin{aligned} \alpha [k] = h[k] v[k]. \end{aligned}$$
(4)

The impulse response h[k] of four-type filter system, i.e., low pass, high pass, band pass, and band stop, could be described as:

$$\begin{aligned}{} & {} h_{\text {lp}}[k] = \frac{\sin (\omega _{l}(k-\tau ))}{\pi (k-\tau )}, \end{aligned}$$
(5)
$$\begin{aligned}{} & {} h_{\text {hp}}[k] = \frac{\sin (k-\tau ) - \sin (\omega _{h}(k-\tau ))}{\pi (k-\tau )}, \end{aligned}$$
(6)
$$\begin{aligned}{} & {} h_{\text {bp}}[k] = \frac{\sin (\omega _{l}(k-\tau )) - \sin (\omega _{h}(k-\tau ))}{\pi (k-\tau )}, \end{aligned}$$
(7)
$$\begin{aligned}{} & {} h_{\text {bs}}[k] = \frac{\sin (k-\tau ) + \sin (\omega _{l}(k-\tau )) - \sin (\omega _{h}(k-\tau ))}{\pi (k-\tau )}, \end{aligned}$$
(8)

where \(h_{\text {lp}}[k], h_{\text {hp}}[k], h_{\text {bp}}[k]\), and \(h_{\text {bs}}[k]\) are the impulse responses of low-pass, high-pass, band-pass, and band-stop filters at point k, respectively. \(\omega _{l}\) and \(\omega _{h}\) are low and high cutoff frequencies, and \(\tau =\frac{K-1}{2}\) is group delay parameter.

In order to only maintain the characteristics in setting frequency, the adopted window function should have small leakage errors. The Hanning window, one of the most popular window functions [62, 63], has a narrow out-of-band spectrum [64], which could enhance the frequency characteristics of data. As a result, the Hanning function is adopted as the window function of FIR filter in this study, expressed as

$$\begin{aligned} v[k] = 0.5\left(1 - \cos \left(\frac{k\pi }{\tau }\right)\right). \end{aligned}$$
(9)

To further enhance heterogeneity of transformation, a variation method is proposed to adjust the parameters of filter. The sine function is applied to fluctuate cutoff frequencies in order to generate heterogeneous filter parameters. The fluctuating cutoff frequencies at time t could be expressed as

$$\begin{aligned} \left\{ \begin{aligned} \omega _\text {l, adj}[t] =&\frac{\omega _{l}}{B_{l,1}} + \frac{\omega _{l}}{B_{l,2}} \sin (\frac{2 \pi t}{T_{l}}) \\ \omega _\text {h, adj}[t] =&\frac{\omega _{l}}{B_{h,1}} + \frac{\omega _{l}}{B_{h,2}} \sin (\frac{2 \pi t}{T_{h}}) \end{aligned} \right. \end{aligned}$$
(10)

where \(\omega _\text {l, adj}[t]\) and \(\omega _\text {h, adj}[t]\) are the varying low and high cutoff frequencies at time t, \(B_{\text {l},1}\) and \(B_{\text {l},2}\) are adjustable parameters for low cutoff frequency, \(B_{\text {h},1}\) and \(B_{\text {h},2}\) are adjustable parameters for high cutoff frequency, \(T_\text {l}\) and \(T_\text {h}\) are the variation period of low and high cutoff frequencies.

For band-pass and band-stop filters, the parameters of variation method must satisfy limit, i.e., \(\omega _\text {h, adj}[k] > \omega _\text {l, adj}[k]\). The constraint could be written as

$$\begin{aligned} \omega _{l}\left(\frac{1}{B_{l,1}} + \frac{1}{B_{l,2}}\right) < \omega _{h}\left(\frac{1}{B_{h,1}} + \frac{1}{B_{h,2}}\right) \end{aligned}$$
(11)

3.3 Computation simplification

To reduce the computational cost, the transformation process is divided into sub-matrices. Assume the raw data \(D\in {\mathbb {R}}_{M\times N}\) has N dimensions and M length. The nth dimensional vector \(D_n\in {\mathbb {R}}_{M\times 1}\) in D could be denoted as

$$\begin{aligned} D_n = [d_n(1), d_n(2), \dots , d_n(M)]^T \end{aligned}$$
(12)

where \(d_n(m)\) is the variable at time m in \(D_n\). The transformed output by the K order FIR filter can be expressed as

$$\begin{aligned} \begin{aligned} u_n(m) =\,&\alpha _{n,m}[1]d_n(m) + \alpha _{n,m}[2]d_n(m-1)\\&+ \dots + \alpha _{n,m}[K]d_n(m-K+1), \end{aligned} \end{aligned}$$
(13)

where \(u_n(m)\) is the transformed variable at time m, \(\alpha _{n,m}[k]\) is the kth filter parameter for \(u_n(m)\). It is worth noting that FIR filter requires sufficient historical data, so m should satisfy \(m\ge K\).

As a result, the transform process could be denoted as

$$\begin{aligned} U_n = P_n^T D_n, \end{aligned}$$
(14)

where \(U_n\in {\mathbb {R}}_{(M-K+1)\times 1}\) is the transformed result. \(P_n\) is the parameter matrix for nth dimensional vector \(D_n\), including filter parameters of each transformation, could be expressed as (15), where \(m'=M-K+1\).

$$\begin{aligned} P_n = \left[ {\begin{array}{*{20}{c}} {{\alpha _{n,1}}[1]}&{}{{\alpha _{n,1}}[2]}&{} \cdots &{}{{\alpha _{n,1}}[K]}&{}0&{} \cdots &{}{}\\ 0&{}{{\alpha _{n,2}}[1]}&{}{{\alpha _{n,2}}[2]}&{} \cdots &{}{{\alpha _{n,2}}[K]}&{}0&{} \cdots \\ {}&{}{}&{}{}&{} \vdots &{}{}&{}{}&{}{}\\ 0&{} \cdots &{}0&{}{{\alpha _{n,m'}}[1]}&{} \cdots &{}{{\alpha _{n,m'}}[K - 1]}&{}{{\alpha _{n,m'}}[K]} \end{array}} \right] _{M\times (M-K+1)} \end{aligned}$$
(15)

The proposed approach has to compute the filter parameters for each data, and the computation cost is expensive. Therefore, to reduce the computation time, the parameter matrix \(P'_n\) could be congregated by several sub-matrices, expressed as

$$\begin{aligned} P'_n = \left[ {\begin{array}{*{20}{c}} {P'_n(1)}&{}{}&{}{}&{}{}\\ {}&{}{P'_n(2)}&{}{}&{}{}\\ {}&{}{}&{} {\ddots } &{}{}\\ {}&{}{}&{}{}&{}{P'_n(Q)} \end{array}} \right] _{QM'\times QN'}, \end{aligned}$$
(16)

where \(P'_n(q)\in {\mathbb {R}}_{M'\times N'}\) is the qth sub-matrix in \(P'_n\), and \(N'=M'+K-1\). Q is the amount of sub-matrices, and the composition form of elements in \(P'_n(q)\) is the same as (15). In this competition, for simplicity, each sub-matrices are set as the same. The transform proccess could be denoted as

$$\begin{aligned} U'_n=P'_n D'_n \end{aligned}$$
(17)

where \(U'_n\in {\mathbb {R}}_{(QM')\times 1}\) is the result after simplified transformation and \(D'_n\in {\mathbb {R}}_{(QN')\times 1}\) is a part data of \(D_n\).

3.4 Problem setting

In the consideration of various requirements in time-series tasks, the problem setting module is proposed. Through setting parameters, certain variables in transformed data are removed as blank for predicting.

Assume there are two different blank setting stages, where \(M_{\text {front}}\) is the length of preserved data, \(M_{\text {stage},1}\) and \(M_{\text {stage},2}\) are two setting stages range, \(T_{\text {stage},1}\) and \(T_{\text {stage},2}\) are computation length of two stages, \(K_{\text {stage},1}\) and \(K_{\text {stage},2}\) are the length of data set as blank in two period, MOD is the remainder symbol.

The main steps of problem setting module could be described as follows.

step1: Leave \(M_{\text {front}}\) length front data unprocessed, and let \(s=1\).

step2: In stage s, set \(K_{\text {stage}, s}\) number of continuous blanks with \(T_{\text {stage}, s}\) period for data \(d_n(m)\), where \(M_{\text {stage},(s-1)}\le m \le M_{\text {stage}, s}\). If \(s=1\), \(M_{\text {stage},(s-1)}\) is the length of preserved data \(M_{\text {front}}\).

step3: If s is not the last setting stage, let \(s=s+1\).

The detailed problem setting procedure is described in Algorithm 1. Note that in general a certain length of last part data would be preserved for researchers to validate their approaches.

figure a

3.5 Computational complexity analysis

The computational complexity of NCAA2022 dataset generation process is analyzed in this subsection. According to (16), the computational complexity of \(P'_n\) parameter generation could be expressed as

$$\begin{aligned} C_{\text {para}, n} = O(QM'K), n=1,2,\dots ,N. \end{aligned}$$
(18)

After calculating the parameter matrix, the computational complexity of the transformation in nth dimension using matrix multiplication, i.e., (17), could be expressed as

$$\begin{aligned} C_{\text {mul}, n} = O(QN'M'), n=1,2,\dots ,N. \end{aligned}$$
(19)

So the overall computational complexity of transformation process could be expressed as

$$\begin{aligned} \begin{aligned} C_{\text {trans}}&= \sum _{n=1}^{N}C_{\text {para}, n} + \sum _{n=1}^{N}C_{\text {mul}, n} \\&= O(QM'KN) + O(QN'M'N) \\&\approx O(QN'M'N). \end{aligned} \end{aligned}$$
(20)

Assuming that there are two different blank setting stages. For the problem setting module, according to Algorithm 1, the computational complexity could be expressed as

$$\begin{aligned} C_\text {problem} = O(QM'N). \end{aligned}$$
(21)

4 Dataset details and evaluation

4.1 The setting parameters of NCAA2022 dataset

Four raw data are generated into 16 problems by the process mentioned in Sect. 3. Part of the generated data details is shown in Table 2. In the NCAA2022 time-series competition, the prediction task is to impute blanks in the first dimension. The parameters of first dimension problem setting are shown in Table 3, and the transformation parameters are shown in Table 4.

Table 2 Information of problems
Table 3 Parameters of the first dimension problem setting
Table 4 Transformation parameters

4.2 Analysis of transformed results

To validate transformed results, discrete Fourier transform (DFT) is introduced to analyze the frequency characteristics. The continuous Fourier transform is defined as

$$\begin{aligned} F(\theta )=\int \limits _{-\infty }^\infty f(t) e ^{-i\theta t}dt, \end{aligned}$$
(22)

where \(F(\theta )\) is the spectrum of f(t) at frequency \(\theta\). Equation (22) could be generated the case of discrete function by let \(t \rightarrow t_k\), where \(t_k\) is discrete time [65]. The DFT could be denoted as

$$\begin{aligned} F[\theta _k] = \sum _{t_k = 0}^{N-1}f[t_k]e^{-2\pi i \theta _k t_k/N} \end{aligned}$$
(23)

where \(F[\theta _k]\) is the interval sampling results of spectrum of series \(f[t_k]\), with \(t_k = 0, \dots , N-1\). The transformed result is symmetric, and only half of results contain useful information [66].

Fig. 3
figure 3

The frequency distribution (first dimension) of stock raw data and related problems

Fig. 4
figure 4

The frequency distribution (first dimension) of wind raw data and related problems

Fig. 5
figure 5

The frequency distribution (first dimension) of chaotic raw data and related problems

Fig. 6
figure 6

The frequency distribution (first dimension) of pandemic raw data and related problems

The frequency distribution characteristics of raw data and transformed results are shown in Figs. 3, 4, 5 and 6. As shown in Fig. 3, stock raw data are mainly distributed at low frequency and the transform process is effective. Comparison results from Fig. 4 reflect that wind speed raw data have a wide distribution. The chaotic system is solved by simulation, and therefore the frequency character of chaotic raw data is distributed smoothly, but the transform process would cause adding a slight noise, which could be observed from Fig. 5. Although the pandemic-related data are short, the periodic characteristic is obvious, as shown in Fig. 6. The DFT analysis could illustrate that transform process could effectively enhance the frequency characteristics of raw data.

4.3 Evaluation metric of NCAA2022 competition

The mean absolute percentage error (MAPE) is adopted to evaluate the prediction results of different problems in this competition. The MAPE of each problem could be expressed as follows:

$$\begin{aligned} MAPE_i=\frac{1}{Q} \sum _{j=1}^{Q} \left|\frac{d^\text {actual}_{i, j} - d^\text {predicted}_{i, j}}{d^\text {actual}_{i, j}} \right|, \end{aligned}$$
(24)

where \(MAPE_i\) is the MAPE of ith problem, Q is the amount of removed data in the first dimension, \(D^{\text {predicted}}_{j, i}\) and \(D^\text {actual}_{j, i}\) are prediction value and actual data, respectively.

As raw data are transformed by four-type filters, the evaluation scores are divided according to the filter type processed. In other words, the prediction results are ranked into problem number 1–4, 5–8, 9–12, and 13–16, 4 groups. The ranking of each group is ordered by the performance score, denoted as

$$\begin{aligned} \text {score} = \frac{1}{4} \sum MAPE_i. \end{aligned}$$
(25)

4.4 Other evaluation metrics

Besides abovementioned evaluation method, there are certain evaluation metrics are used in other researches, such as root-mean-square error (RMSE), mean absolute error (MAE), normalized root-mean-square (nRMSE), and symmetric mean absolute percentage error (sMAPE), as shown in (26)–(29).

$$\begin{aligned}{} & {} MAE_i = \frac{1}{Q}\sum _{j=1}^{Q} |d^{\text {actual}}_{i, j} - d^{\text {predicted}}_{i, j} |\end{aligned}$$
(26)
$$\begin{aligned}{} & {} RMSE_i = \sqrt{\frac{1}{Q}\sum _{j=1}^{Q} ( d^{\text {actual}}_{i, j} - d^{\text {predicted}}_{i, j})^2} \end{aligned}$$
(27)
$$\begin{aligned}{} & {} nRMSE_i = \frac{1}{d^{mean}_{i}}\sqrt{\frac{1}{Q}\sum _{j=1}^{Q} ( d^{\text {actual}}_{i, j} - d^{\text {predicted}}_{i, j})^2} \end{aligned}$$
(28)
$$\begin{aligned}{} & {} sMAPE_i = \frac{1}{Q} \sum _{j=1}^{Q} \frac{ |d^{\text {actual}}_{i, j} - d^{\text {predicted}}_{i, j} |}{|d^{\text {actual}}_{i, j} |+ |d^{\text {predicted}}_{i, j} |} \end{aligned}$$
(29)

where \(\text {MAE}_i\), \(\text {RMSE}_i\), \(\text {nRMSE}_i\), and \(\text {sMAPE}_i\) are MAE, RMSE, nRMSE, and sMAPE of ith problem, respectively, \(d^\text {mean}_{i}\) are the mean value of actual data.

The four indicators could reflect different characteristics of prediction performance. Generally, the MAE and sMAPE show the overall error and is adopted to evaluate wind-related and pandemic-related problems. The metrics RMSE and nRMSE are able to amplify the extreme prediction error, so they are suitable for evaluating the problems with continuity, e.g., stock-related and chaos-related problems.

4.5 Baseline experiment

In this subsection, we select five baseline approaches, i.e., autoregressive exogenous (ARX) model, back-propagation (BP) network, echo state network (ESN), recurrent neural network (RNN), and long short-term memory (LSTM), to run on the NCAA2022 dataset.

The settings of hyperparameters are referred to [35, 67], and [68]. For BP network, 2 hidden layers with 250 nodes are included, and learning rate is chosen as 0.03. In RNN and LSTM network, 2 hidden layers, which has 200 and 250 units, are included, and learning rate is set as 0.01. The variance of Gaussian noise in ARX model is set as 0.5. The parameter selection of ESN is different from BP-based network, so we take an enumerate experiment for each hyperparameter in steps of 0.1. Then the hyperparameter with the best result is used in the baseline experiments: the reservoir size is set as 70; the uniform range of input weight and the spectral radius of reservoir are set to be 0.7 and 0.8, respectively.

For simplification and experiment fairness, the historical data in the first dimension is only considered as input for baseline approaches, and the preserved part of data is selected as the training set. The sliding window is adopted to pre-process data, and the window size of each baseline methods is set as 7.

Table 5 MAPE results of baseline methods

The MAPE results of baseline approaches are shown in Table 5. Each method has better performance than others in certain problems. Due to the special training mechanism, ESN has well convergence property [69]. Therefore, ESN has better overall performance than other baseline approaches in the case of short training length, especially on stock-related problems. In addition, it is hard for LSTM to learn the hyperparameters with a short training set, thus the performance of LSTM on pandemic-related problems is worse than the RNN’s. BP neural network has fewer weight parameters and the requirement of training set than RNN and LSTM networks [70], and therefore BP neural network performance is closer or better than RNN in problems with mainly low-pass data. However, due to the memory capability, in pandemic-related data, which has strong periodical characteristics, RNN performs better than BP network’s performance. ARX is more sensitive to low-frequency data as its algorithm is simple, and the added Gaussian noise makes it performs well in band-stop data, even having the best performance than other neural-based approaches in problem XIII.

Fig. 7
figure 7

The evaluation score of baseline approaches’ performance

The scores of baseline methods are shown in Fig. 7. The characteristics of baseline approaches in problems with various frequency features could be reflected from the scores. The ARX model has a poor performance in low-pass problems; however, the ARX model has a performance close to that of other baseline approaches in other types of problems. The performance of neural-based approaches in high-pass problems validates the conclusion in [36], i.e., neural network-based methods have poor generalization in high-frequency signals. Different baseline approaches show varying performance on each category, which indicate our proposed benchmarks could provide a frequency-based evaluation platform for time-series prediction algorithms.

5 Conclusion

In this paper, four raw data are transformed into 16 problems as the prediction task of NCAA2022 time-series competition. Each raw datum has various characteristics, and covers popular research areas. Different from existing competition datasets, from the perspective of frequency domain, four raw data are transformed into various problems. To reduce the computational burden, filter parameters are divided into sub-matrices. The comparative results in the frequency domain illustrate the transform process is effective. With the NCAA2022 evaluation metric, five baseline approaches are run on problems to further validate the benchmarking capability of NCAA2022 dataset.

Although the effectiveness of NCAA2022 dataset is validated, some limitations still exist. For example, the NCAA2022 dataset is generated from only four popular-studied instances; however, there are other hot topics of time-series prediction not included. Furthermore, the current scale of NCAA2022 may not fit those large-scale prediction approaches. Our future work is to further add various characteristics instances and research the evaluation metrics to efficiently demonstrate the frequency features of algorithms.