1 Introduction

“The 12th 5 Year Plan (2011–2015)” of the Chinese government puts forward the goals, major tasks and standards for the harmless disposal of urban waste and the waste-to-energy (WTE) incineration industry [1]. Li et al. [2] found that compared with landfills widely used in China, incineration technology can effectively reduce the capacity of garbage by more than 90%, with a high reduction degree and no further decomposition. The ash after incineration can be used as soil mulch, which is incinerated cleanly and stabilized in an ideal state. However, incineration technology also has some disadvantages, such as high project investment, high operating cost, difficult management, etc. In addition, burning will also produce air pollution, which will cause damage to our atmospheric environment to a certain extent.

SO2 is a kind of acid gas harmful to the atmosphere, which is easy to release into the air for chemical reaction and form acid rain. Reference [4] shows that when the concentration of SO2 in the air increases, it will directly affect people's respiratory system, cause respiratory diseases and endanger human health. According to the national environmental protection requirements, during domestic waste incineration, the 1 h average emission concentration of SO2 should not exceed 100 mg/m3, and the 1 day average emission concentration should not exceed 80 mg/m3. However, by observing the monitoring data of the historical sensors in the waste incineration plant, we found that the SO2 emission concentration still exceeded the standard. Therefore, it is vital to establish the SO2 concentration prediction mechanism, timely predict the SO2 concentration in future moments and timely control the emission target of SO2.

We mainly established an AR-LSTM prediction model based on ARIMA and LSTM. The main contributions are as follows:

  1. (1)

    Data preprocessing We carry out data analysis and prediction based on the real data set collected from the production line of the enterprise. The data set is relatively cluttered, with many missing values and outliers. We carry out a lot of preprocessing work on the data set to facilitate subsequent analysis and prediction work.

  2. (2)

    Feature extraction We analyze and predict the SO2 concentration sequence in the dataset as the target sequence, but the dataset contains several sequences. We use the Pearson correlation coefficient and RFE to screen related sequences to retain the influence of different sequences on the target sequence and improve the accuracy of analysis and prediction.

  3. (3)

    Analytical modeling We use the ARIMA model and the LSTM network for analysis and prediction. On the one hand, the ARIMA model is used to perform time-series analysis and prediction for the SO2 target sequence, and on the other hand, the LSTM network is used to perform correlation analysis and prediction for the remaining multiple related sequences.

This paper consists of five parts. The remaining parts are as follows: The second part is the introduction of related work; The third part is the method part, including data set processing and AR-LSTM model construction; The fourth part is the experiment and analysis, including the visualization of the predicted results and the comparison experiment with other models; The fifth part is the summary part, which summarizes and analyzes the overall work and results of this paper.

2 Related Work

Autoregressive integrated moving average (ARIMA) is a classical model of time series prediction and the most commonly used model to fit stationary series. George [6] introduced the ARIMA model in detail; Le Jian et al. [7] used ARIMA to study the influence of meteorological factors on the concentration of submicron particles (UFP) and particulate matter 1.0 (PM1.0) under busy traffic conditions; Khashei [8] and others improved the ARIMA model, proposed the FARIMA based on the ARIMA model. With the continuous development of machine learning, some researchers combine statistical methods with artificial intelligence algorithms. For example, the ARIMA is combined with the neural network model. Li [9] and others combined MGM and BP neural network with ARIMA to construct MGM-ARIMA and BP-ARIMA to predict coal consumption; Based on combinational prediction, Liu et al. [10] used ARIMA, ANNS and EMS to predict the time series data of PM2.5 concentration; Zhu [11] proposed a new technique for PM2.5 concentration prediction based on ARIMA and improved BP neural network; Wang et al. [12] proposed an HDIPSO prediction algorithm based on neural networks, which adopted a new speed updating strategy and mutation operation to improve convergence and increase group diversity.

With the continuous development of deep learning in recent years, the prediction model based on deep learning has been widely applied. The prediction model based on Long Short-Term Memory (LSTM) [13] is the main one. Hochreiter and Schmidhuber [14] proposed the LSTM network in 1997 to solve the problem of gradient disappearance; Pathan Refat Khan [15] et al. used LSTM to predict the gene mutation rate of novel Coronavirus (COVID-19); Jun Hu et al. [16] proposed an LSTM model with transformation mechanism based on LSTM, and designed an adaptive course learning mechanism; Bai et al. [17] proposed a set of long-term and short-term memory neural network (E-LSTM), which has better predictive performance than using a single LSTM; Qi [18] proposed a hybrid model based on deep learning, which combined the graphic convolutional network with Long Short-Term Memory (GC-LSTM) for prediction; Karim [19] used the LSTM sub-module to enhance the whole convolutional network for time series classification, and proposed the ALSTM-FCN. The combination of models has become a popular research direction for the prediction of SO2 and other pollutants concentration. U. Brunelli et al. [20] proposed a predictor based on a recurrent neural network (Elman model) and used this predictor to perform daily analysis on Palermos SO2, O3, PM10 and other pollutants. For predicting the maximum concentration, RMSE, MAE and MSE are used as evaluation indicators for the effectiveness of the model; Bingyue Pan[21]used the XGBoost algorithm to analyze the air quality monitoring data in Tian**. At the same time, it predicted the PM2.5 concentration and compared it with other well-known data mining models; Tecer et al. [22] first applied artificial neural network to SO2 concentration prediction. They confirmed that artificial neural networks can be effectively used for air quality analysis and prediction.

Linliang Zhang et al. [23] proposed a method for predicting SO2 concentration based on fuzzy time series and support vector machine (SVM). Taking the one-hour average SO2 concentration as sample data, an SO2 concentration prediction model was established; Shams et al. [24]recently studied the performance comparison of artificial neural network (ANN) and multiple linear regression (MLR) in predicting SO2 concentration. Their research shows the importance of artificial neural network modeling and application in reducing urban pollution. The starting point of our article is the same as theirs, and the purpose is to control and reduce pollutant emissions. The evaluation index of the selected model is also the same. The difference is that the LSTM network used in this article needs to be updated relative to the network they use. **ang Li [25] et al. proposed the LSTME model and extended the LSTM model to capture the long-term temporal and spatial correlation of air pollutant concentrations. The model was compared with the traditional RNN model and showed better prediction performance.

By analyzing the results of many researchers in the field of multivariate time-series data analysis and prediction, it is found that although the analysis and prediction of time-series data have achieved remarkable results, there are still some limitations and also several great challenges that have been existing in the analysis and prediction of multivariate time-series data nowadays:

  1. (1)

    The diversity of time series data. Changes in one sequence of time series data are often determined by multiple sequences, and at the same time, changes in one sequence often affect changes in multiple sequences. This complicates the data analysis process.

  2. (2)

    Timing of time series data. Time series data is data that changes with time and often has a change rule within a certain period of time, which greatly affects the overall analysis of the data by the model.

  3. (3)

    Instability of time series data. Time series data are often real data sets, and most of the data contain abnormal information such as missing values and data mutation. How to deal with these abnormal information is also a great challenge for multivariate time series data prediction.

This paper mainly analyzes and predicts based on the above characteristics of time series data. First, the data set is preprocessed to solve the missing value outliers and other information in the data, and secondly, relevant variables are filtered through feature extraction to remove redundant redundant information. Finally, the ARIMA model and LSTM network are used to analyze and predict the time series data, and the experimental results prove the effectiveness of the proposed model.

3 Data Processing and Modeling

This chapter consists of two parts. The first part is data processing; the second part is modelling. The first part introduces the process of data processing in detail; the second part introduces the structure of AR-LSTM model in detail. Table 1 clearly describes the data set used in our experiment.

Table 1 Detail of the datasets

3.1 Data Processing

3.1.1 Missing Value Processing

To ensure the integrity of the time series, we use the forward filling method to process the missing values and fill the missing values with the values before the missing value.

3.1.2 Feature Extraction

Through the observation of historical data of waste incineration plants, it is found that many related factors affect the concentration of SO2 emissions. To ensure the prediction accuracy of AR-LSTM model, we first used a feature selection algorithm to screen out related factors with high correlation degree with SO2 emission concentration from a variety of related factors. Pearson correlation coefficient (Pearson) and recursive feature elimination algorithm (RFE) are selected for feature selection.

  1. (1)

    Pearson correlation coefficient

Pearson correlation coefficient (Pearson) measures the degree of correlation between two variables, the calculated range of correlation coefficient is [− 1, 1]. The closer the correlation coefficient is to 1 or − 1, the stronger the correlation is. The closer the correlation coefficient is to 0, the weaker the correlation is [26], 27. Essentially, Pearson correlation coefficient is measured by calculating the ratio between the covariance of two objects and the standard deviation. The formula is as follows:

$$ \rho_{{{\text{x}},y}} = \frac{{\sum\nolimits_{i = 1}^{N} {\left( {x_{i} - \overline{x}} \right)\left( {y_{i} - \overline{y}} \right)} }}{{\sqrt {\sum\nolimits_{i = 1}^{N} {(x_{i} - \overline{x})^{2} } } \sqrt {\sum\nolimits_{i = 1}^{N} {(y_{i} - \overline{y})^{2} } } }} $$
(1)

where x and y represent two attribute objects, and N represents all attribute objects.

Pearson correlation coefficient will be used to analyze the correlation coefficient between export SO2 emission concentration and other relevant influencing factors, calculate the Pearson correlation coefficient between SO2 emission concentration and various influencing factors, and select the characteristic value according to the absolute value of the correlation coefficient. Tables 2, 3 show the Pearson correlation coefficient analysis results.

Table 2 Inlet correlation coefficient
Table 3 Export correlation coefficient

According to the Pearson correlation coefficient calculation of SO2 emission concentration and other factors related to each link, the exportation concentration of SO2 has a high correlation with the exportation concentration of CO, O2, inlet coal feeder, inlet feeder, inlet secondary fan, inlet cloth bag dust removal pressure difference, furnace fault temperature and furnace top temperature. In the following part of this paper, Recursive feature elimination algorithm (RFE) is used for secondary feature selection. The final feature to be used is determined by combining the results of the two feature selections.

  1. (2)

    Recursive feature elimination algorithm

Recursive feature elimination algorithm is a greedy algorithm for finding optimal subsets. Given a set of feature, recursive feature elimination reduces the scope of the feature set through recursion again and again until the desired number of features is reached. Then the remaining feature quantities are selected. The algorithm process is as follows:

  1. 1.

    Training classifier;

  2. 2.

    Calculate the importance of each feature;

  3. 3.

    Eliminate features of low importance;

  4. 4.

    Train again until the number of features reaches the set value;

This work uses RFECV to perform RVE and uses the classic SVM-RFE algorithm for 3-fold cross-validation for feature selection. The dataset used in this work contains a total of 17 related variables, that is, 17 features, so there are 217–1 feature subsets in total. Perform multiple rounds of training through the SVM algorithm, calculate the validation error of all feature subsets, and select the features contained in the subset with the smallest error rate until the number of features meets our requirements. Table 4 shows the selection results of features.

Table 4 Recursive feature filter table

Through the Recursive Feature Elimination algorithm to analyze the data of the three lines A, B and C of the waste incineration plant, it is found that coal feeder, feeder, CO concentration, O2 concentration, inlet secondary fan, dust bag pressure difference and furnace fault temperature are of relatively high importance. After Pearson and Recursive Feature Elimination algorithm (RFE) extracted features from the historical data, six features (coal feeder, feeder, secondary fan, dust bag pressure difference, O2 concentration, CO concentration) are selected as the input of the next AR-LSTM prediction model.

3.2 AR-LSTM Model

The AR-LSTM model diagram constructed is shown in Fig. 1.

Fig. 1
figure 1

AR-LSTM model diagram

3.2.1 ARIMA

The ARIMA, called the Autoregressive Integrated Moving Average, is famous as a time series prediction model. The model’s basic idea is that the historical data sequence formed over time is regarded as a random sequence, and the variation trend of the sequence can be approximated by some appropriate mathematical model. Once the model is established, the existing historical information can be used to predict the unknown information of the future moment [28]. According to whether the original sequence is stable or not and the part contained in the regression is different, including the moving average process (MA (q)), autoregression process (AR (p)), autoregression moving average process (ARMA (p, q)), autoregression integrated moving average mixed process (ARIMA (p, d, q)). Among them, AR is autoregressive, and p is the number of autoregressive items; MA is the moving average, q is the number of moving average items, and d is the number of different times made when the time series becomes stationary [29].

ARIMA (p, d, q) can be expressed as follows:

$$ w_{t} = \Delta^{d} x_{t} = \left( {1 - L} \right)^{d} x_{t} $$
(2)
$$ w_{t} = \phi_{1} w_{t - 1} + \phi_{2} w_{t - 2} + \cdots + \phi_{p} w_{t - p} + \mu_{t} + \theta_{1} \mu_{t - 1} + \cdots + \theta_{q} \mu_{t - q} $$
(3)

In Eq. (2), d is the number of differences to transform a non-stationary series into a stationary series. \(\Delta\) represents difference operator, \(\Delta^{d} x_{t}\) represents the d-order difference sequence.

In Eq. (3), \(w_{t}\) represents the current value, \(\phi\) represents the autocorrelation coefficient, \(\mu\) represents white noise.

The prediction of ARIMA is divided into the following steps:

  1. (1)

    Obtain time series data;

  2. (2)

    Draw a trend chart according to the data to observe whether it is a stationary series. For non-stationary time series, the d-order difference operation should be carried out first to transform it into stationary time series. If it is stationary sequence, ARMA (p, q) model is used directly.

  3. (3)

    After processing the series into stationary time series, the ACF and partial autocorrelation coefficient PACF are obtained. By analyzing autocorrelation and partial autocorrelation, the optimal level p and order q are obtained.

  4. (4)

    Based on d, q and p obtained above, ARIMA model is obtained. Then the existing historical series data are used to test the model to ensure that the actual and the predicted error values are controlled within the set threshold. If the error does not achieve the expected effect, return (2).

Among them:

$$ ACF\left( {\text{k}} \right) = \rho_{k} = \frac{{Cov\left( {y_{t} ,y_{t - k} } \right)}}{{Var\left( {{\text{y}}_{t} } \right)}} $$
(4)

In Eq. (4), \(y_{t}\) represents the value at time t,\(y_{t - k}\) represents the value at time t − k.

3.2.2 LSTM

The LSTM neural network is called the Long Short-Term Memory. LSTM was first put forward by Hochdirect-Schmidhuber, a special circulating neural network for solving the gradient disappearance of recurrent neural network (RNN). This network proposes a node structure different from that of ordinary neurons and introduces the concept of control gate [30,31,32]. LSTM network is suitable for classification, processing and prediction of time series data. The proposed node is called the LSTM cell, and its structure is shown in Fig. 2.

Fig. 2
figure 2

LSTM structural unit diagram

LSTM unit [33], 34 is divided into four parts: input, control signal of input gate, control signal of forgetting gate and control signal of output gate. The input gate controls whether write is allowed, the forgetting gate controls whether the value of the memory unit needs to be updated, and the output gate controls whether output is allowed. The internal structure diagram is shown in Fig. 3.

Fig. 3
figure 3

LSTM Unit

Z is the input, and Zi is the control signal of the input gate. Zf represents the control signal of the forgetting gate. Zo represents the control signal of the output gate. The f(x) function is usually Sigmod:

$$ f\left( x \right) = \frac{1}{{1 + e^{ - x} }} $$
(5)

This function is controlled within the range of [0, 1] and can indicate the opening degree of the door. g(x) and h(x) are activation functions. First, Z is activated to get g(Z), Z is activated to get f (Zi) by Sigmod and then multiplied to get g(Z) f(Zi). Zf is obtained by Sigmod function to obtain f(Zf) and then multiplied by the value an of the memory unit existing at the last time to obtain cf (Zf). Then, the value of memory unit is updated to:

$$ c^{^{\prime}} = g\left( Z \right)f\left( {Z_{i} } \right) + cf\left( {Z_{f} } \right) $$
(6)

c′ gets h(c′) after activation function; Zo gets f(Zo) through Sigmod function, multiply to get the output:

$$ a = h\left( {c^{^{\prime}} } \right)f\left( {Z_{o} } \right) $$
(7)

The input Xt at time t and the output ht-1 of the hidden layer neuron at time t − 1 are used together as the input part of the hidden layer at time t. They are multiplied by different weight vectors respectively. After activation function, the control signals Zf, Zi, Zo and the input value Z of the three gates are obtained. The formula is as follows:

$$ Z_{f} = \omega_{f} \left[ {h_{t - 1} ,x_{t} } \right] + b_{f} $$
(8)
$$ Z_{i} = \omega_{i} \left[ {h_{t - 1} ,x_{t} } \right] + b_{i} $$
(9)
$$ Z_{o} = \omega_{o} \left[ {h_{t - 1} ,x_{t} } \right] + b_{i} $$
(10)
$$ Z = \omega_{x} \left[ {h_{t - 1} ,x_{t} } \right] + b_{x} $$
(11)

Among them,\(\omega\) represents the weight vector, \(b\) represents the activation amount.

Where bf, bi, bo and bx are respectively the bias of different connection weights. After the operation of LSTM unit, the value c of the memory unit is updated (7) to get:

$$ c^{\prime } = g\left( {\omega_{x} \left[ {h_{t - 1} ,x_{t} } \right] + b_{x} } \right)f\left( {\omega_{i} \left[ {h_{t - 1} ,x_{t} } \right] + b_{i} } \right) + cf\left( {\omega_{f} \left[ {h_{t - 1} ,x_{t} } \right] + b_{f} } \right) $$
(12)

The output of hidden layer neurons is:

$$ \begin{aligned} h_{t} & = h\left( {c^{\prime } } \right)f\left( {Z_{o} } \right) \\ & { = }h\left( {g\left( {\omega_{x} \left[ {h_{t - 1} ,x_{t} } \right] + b_{x} } \right) \cdot f\left( {\omega_{i} \left[ {h_{t - 1} ,x_{t} } \right] + b_{i} } \right) + cf\left( {\omega_{{\text{f}}} \left[ {h_{t - 1} ,x_{t} } \right] + b_{f} } \right)} \right) \\ & \quad \cdot f\left( {\omega_{o} \left[ {h_{t - 1} ,x_{t} } \right] + b_{o} } \right) \\ \end{aligned} $$
(13)

From the above, it can be seen that the input of LSTM not only includes the output ht−1 of the neuron in the hidden layer at the previous time, and includes the value of the memory unit in the LSTM unit. LSTM network can effectively avoid the occurrence of gradient disappearance, can remember long-term historical information, and more effectively fit long-term time series data [35]. Therefore, LSTM neural network is widely used in speech recognition, image recognition, sequence prediction and other aspects and has achieved good results.

3.2.3 AR-LSTM

The AR-LSTM model based on ARIMA model and LSTM neural network is used to predict the concentration of SO2. On the one hand, this model uses ARIMA model to predict according to the linear relationship between time series data and SO2 concentration data. On the other hand, LSTM neural network is used for prediction according to the results of 3.1.2 feature extraction. Finally, the CRITIC weight assignment method is used to calculate the weights of the budget results of the two models separately. The prediction results of the two models are combined and processed.

Among them, the CRITIC assignment method is an objective weight assignment method. The basic idea is to determine the objective weight of the index based on contrast and conflic. The conflict between the jth indicator and other indicators is as follows:\(\sum\limits_{{{\text{k}} = 1}}^{{\text{k}}} {\left( {1 - {\text{r}}_{k,j} } \right)}\), j = 1, 2, 3, …, J. Let Cj be the information contained in the jth evaluation index, and Cj is expressed as:

$$ C_{j} = \sigma_{j} \sum\limits_{k = 1}^{k} {\left( {1 - r_{k,j} } \right)} ,\quad {\text{ j}} = {1},{2},{3}, \ldots ,{\text{J}} $$
(14)

Among them, \(\sigma_{{\text{j}}}\) is the standard deviation of the jth index; \({\text{r}}_{k,j}\) is the correlation coefficient between the evaluation index k and j. Wj is:

$$ w_{j} = \frac{{C_{j} }}{{\sum\nolimits_{j = 1}^{j} {C_{j} } }},\quad j = 1,2,3, \cdots J $$
(15)

The AR-LSTM model consists of input layer, hidden layer and output layer. The input vectors X1 and X2 are respectively time series data and SO2 concentration data. Input vector (X3, … X9) is six eigenvalues and SO2 concentration data. The activation function uses ELU function, with full connection between each layer, and the final output is Yt, that is the concentration of SO2. The training model optimizer adopts Adam optimizer.

4 Experiment and Analysis

In this paper, this experiment uses Python 3.7 programming language, TensorFlow 1.13.1 deep learning framework and Keras library.

4.1 ARIMA modeling

Before modelling with ARIMA model, the stationarity test of the data set should be carried out. If the test result is stationary series, no processing is required; if the test result is non-stationary series, differential processing of the data set is required. We use the unit root test (ADF) to determine whether the data set is stationary series. Table 5 shows the unit root test results of the two data sets.

Table 5 ADF test results

It can be seen from Table 5 that the ADF statistics of both data sets are more significant than the three critical values. Therefore, both data sets are non-stationary sequences. The first difference method is used to realize the stability of two data sets.

According to the data after the first-order difference, the ARIMA model is determined by autocorrelation function (ACF) and partial autocorrelation function (PACF). After calculation, hourly predictions are made using the ARIMA (2, 1, 0) model, and day-by-day predictions are made using ARIMA (1,1,0) model. Figure 4 shows hourly and day-by-day prediction results. Where blue is the true value and orange is the predicted value.

Fig. 4
figure 4

Prediction renderings. This is the prediction result diagram of using the ARIMA model to predict. The abscissa is time and the ordinate is SO2 concentration. Among them: a The predicted result of data in hours in a certain period of time, blue is the actual value and orange is the predicted value. b The forecast result of data in days in a certain period of time, blue is the actual value, and orange is the predicted value

4.2 LSTM Modeling

Multiple features and characteristic factors extracted in 3.1.2 feature extraction process (including six characteristics: inlet coal feeder, inlet feeder, inlet secondary fan, inlet cloth bag dust removal pressure difference, outlet O2 concentration and outlet CO concentration) are taken as the input of the LSTM neural network. The output of this network is the concentration of SO2. Considering that the experimental data used is small and the training of multi-layer neural network is easy to overfit, in order to avoid the phenomenon of overfitting, the LSTM neural network with single-layer hidden layer structure is used for prediction. The network loss function adopts Mean Absolute Error(MAE) loss function and the optimization algorithm adopts Adam optimization algorithm. In order to further determine the number of neurons in the hidden layer, experiments are carried out under four conditions of 25, 50, 75 and 100 neurons respectively. The judgment indexes were mean square error (RMSE), the coefficient of determination (R2) and average absolute error (MAE). The comparison results are shown in Table 6.

Table 6 Comparison table of different neuron number

Among them, R2 is the coefficient of determination, which is the index to evaluate the good or bad fitting of the model. It is most commonly used to evaluate the regression model. The closer the value is to 1, the better the fitted equation is. RMSE is the root mean square error, which is used to detect the deviation between the predicted value and the actual value of the model. RMSE is often used as the measurement standard of machine learning model prediction results. MAE is the mean absolute error, the mean of the absolute error, which can better reflect the true situation of the predicted value error.

As shown in Table 6, with the increase in the number of neurons, the error does not necessarily decrease but may increase. When the number of neurons is 50, the error is minimal. Therefore, the number of neurons in the hidden layer of LSTM neural network was set as 50. The model uses 50 epochs, and each batch size is 72 for training. Figure 5 shows the loss function value diagram during the training of the LSTM neural network.

Fig. 5
figure 5

Loss function value graph

After the LSTM neural network training was completed, the hourly and daily predictions are carried out on the two test sets, respectively. The predicted values of SO2 concentration are outputted and visualized. The black curve is the predicted value, and the red curve is the actual value. The hourly and daily prediction results of the LSTM neural network model are shown in Fig. 6.

Fig. 6
figure 6

Prediction renderings. This is the prediction result diagram of using the LSTM to predict. The abscissa is time and the ordinate is SO2 concentration. Among them: a The prediction result of data in hours in a certain period of time, blue is the actual value, orange is the predicted value. b The forecast result of data in days in a certain period of time, blue is the actual value, and orange is the predicted value

4.3 AR-LSTM Modeling

We aim to use AR-LSTM model based on ARIMA and LSTM neural network to predict SO2 concentration. In the prediction process of this model, the ARIMA model and the LSTM neural network are processed in parallel, and the time series prediction and multi-dimensional correlation factor prediction are carried out on the same data set. Considering that the experimental data are the real data of a garbage incineration plant, the single use of ARIMA model for time series prediction and the use of LSTM neural network for multi-dimensional correlation factor prediction can not achieve the ideal prediction effect. Therefore, the AR-LSTM model in this article uses two models to predict the SO2 concentration respectively. The CRITIC weight assignment method then calculates the weights of the two models, where the weight range is [0, 1]. According to the weight value, the prediction results of the two models are processed to get the final prediction results. This article makes hourly forecasts at [‘2019/9/1 00:00’–‘2019/9/14 3:00’]; makes daily forecasts on the [‘2019/9/1’–‘2019/9/24’]. Figures 7, 8 respectively show the comparison results of four different models.

Fig. 7
figure 7

This is a comparison chart of the prediction results of the four models in a fixed period of time on the hourly data

Fig. 8
figure 8

This is a comparison diagram of the prediction results of the four models in a certain period of time on the daily data

The values of the ARIMA model, the LSTM neural network model and the AR-LSTM model in hourly and daily prediction are calculated and compared using these three evaluation indexes. The comparison results of R2, RMSE and MAE of each model are shown in Table 7.

Table 7 Evaluation index result table

It can be seen from Table 7 that the three indexes of the AR-LSTM model are superior to the single ARIMA model and the LSTM neural network model, whether it is forecast hour by hour or day by day. The indicators of LSTM-RNN are close to AR-LSTM, but AR-LSTM is still slightly better. The hourly prediction error of all three models is lower than the daily prediction. The training data of hourly prediction is more than that of daily prediction, which leads to the higher training accuracy of AR-LSTM model in hourly prediction. To sum up, the effectiveness of AR-LSTM model is proved. On this basis, we use this model to predict SO2 emissions in the next few weeks, and the prediction results are shown in Fig. 9.

Fig. 9
figure 9

This is the result chart of SO2 emission prediction in the next four weeks, in which the solid black line is the actual value and the dotted blue line is the predicted value

5 Conclusions

AR-LSTM model is constructed to predict the concentration of SO2 hourly and daily and predict the emission of SO2 in the next few weeks. The process of data processing mainly includes three aspects: missing value processing, data marking and feature extraction; Then, the ARIMA model, LSTM model, LSTM-RNN model and the AR-LSTM model are respectively carried out prediction experiments on the same data set. Finally, R2, RMSE and MAE are selected as the evaluation indexes to evaluate each model, and each model’s prediction results are visually compared. Based on the analysis of the predicted results and the comparative experimental results, the following conclusions can be drawn:

  1. (1)

    Judging from the prediction results, the accuracy of hourly prediction is higher than that of daily prediction. This is because the amount of data predicted hour by hour is much greater than that predicted day by day. In the process of hourly prediction, the model is better trained, and the data is better fitted. The training of daily prediction is relatively more minor, which leads to the low accuracy of model prediction.

  2. (2)

    According to the comparative experimental results, the coefficient of determination (R2) of AR-LSTM model is higher than that of the single ARIMA model and the LSTM neural network whether the prediction is on an hourly or a daily basis. The root mean square error (RMSE) and mean absolute error (MAE) of AR-LSTM model is smaller than those of ARIMA model and LSTM neural network. The indicators of LSTM-RNN are close to AR-LSTM, but AR-LSTM is still slightly better. This proves the effectiveness of the AR-LSTM combination model to some extent.

  3. (3)

    Finally, after determining the effectiveness of the AR-LSTM model, we use this model to predict the SO2 emissions in the next few weeks and visualized the predicted results.