Prediction of SO2 Concentration Based on AR-LSTM Neural Network

Ju, Jie; Liu, Ke’nan; Liu, Fang’ai

doi:10.1007/s11063-022-11119-7

Prediction of SO₂ Concentration Based on AR-LSTM Neural Network

Published: 24 December 2022

Volume 55, pages 5923–5941, (2023)
Cite this article

Download PDF

Neural Processing Letters Aims and scope Submit manuscript

Prediction of SO₂ Concentration Based on AR-LSTM Neural Network

Download PDF

Abstract

Sulphur dioxide is one of the most common air pollutants, forming acid rain and other harmful substances in the atmosphere, which can further damage our ecosystem and cause respiratory diseases in humans. Therefore, it is essential to monitor the concentration of sulphur dioxide produced in industrial processes in real-time to predict the concentration of sulphur dioxide emissions in the next few hours or days and to control them in advance. To address this problem, we propose an AR-LSTM analytical forecasting model based on ARIMA and LSTM. Based on the sensor’s time series data set, we preprocess the data set and then carry out the modeling and analysis work. We analyze and predict the proposed analysis and prediction model in two data sets and conduct comparative experiments with other comparison models based on the three evaluation indicators of R², RMSE and MAE. The results demonstrated the effectiveness of the AR-LSTM analytical prediction model; Finally, a forecasting exercise was carried out for emissions in the coming weeks using our proposed AR-LSTM analytical forecasting model.

Analysis and Prediction of Air Pollution in Assam Using ARIMA/SARIMA and Machine Learning

Modeling and Prediction of Meteorological Parameters Using the Arima and LSTM Methods: Sivas Province Case

Air pollution prediction system using XRSTH-LSTM algorithm

Article 23 July 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

“The 12th 5 Year Plan (2011–2015)” of the Chinese government puts forward the goals, major tasks and standards for the harmless disposal of urban waste and the waste-to-energy (WTE) incineration industry [1]. Li et al. [2] found that compared with landfills widely used in China, incineration technology can effectively reduce the capacity of garbage by more than 90%, with a high reduction degree and no further decomposition. The ash after incineration can be used as soil mulch, which is incinerated cleanly and stabilized in an ideal state. However, incineration technology also has some disadvantages, such as high project investment, high operating cost, difficult management, etc. In addition, burning will also produce air pollution, which will cause damage to our atmospheric environment to a certain extent.

SO₂ is a kind of acid gas harmful to the atmosphere, which is easy to release into the air for chemical reaction and form acid rain. Reference [4] shows that when the concentration of SO₂ in the air increases, it will directly affect people's respiratory system, cause respiratory diseases and endanger human health. According to the national environmental protection requirements, during domestic waste incineration, the 1 h average emission concentration of SO₂ should not exceed 100 mg/m³, and the 1 day average emission concentration should not exceed 80 mg/m³. However, by observing the monitoring data of the historical sensors in the waste incineration plant, we found that the SO₂ emission concentration still exceeded the standard. Therefore, it is vital to establish the SO₂ concentration prediction mechanism, timely predict the SO₂ concentration in future moments and timely control the emission target of SO₂.

We mainly established an AR-LSTM prediction model based on ARIMA and LSTM. The main contributions are as follows:

(1)
Data preprocessing We carry out data analysis and prediction based on the real data set collected from the production line of the enterprise. The data set is relatively cluttered, with many missing values and outliers. We carry out a lot of preprocessing work on the data set to facilitate subsequent analysis and prediction work.
(2)
Feature extraction We analyze and predict the SO₂ concentration sequence in the dataset as the target sequence, but the dataset contains several sequences. We use the Pearson correlation coefficient and RFE to screen related sequences to retain the influence of different sequences on the target sequence and improve the accuracy of analysis and prediction.
(3)
Analytical modeling We use the ARIMA model and the LSTM network for analysis and prediction. On the one hand, the ARIMA model is used to perform time-series analysis and prediction for the SO₂ target sequence, and on the other hand, the LSTM network is used to perform correlation analysis and prediction for the remaining multiple related sequences.

This paper consists of five parts. The remaining parts are as follows: The second part is the introduction of related work; The third part is the method part, including data set processing and AR-LSTM model construction; The fourth part is the experiment and analysis, including the visualization of the predicted results and the comparison experiment with other models; The fifth part is the summary part, which summarizes and analyzes the overall work and results of this paper.

2 Related Work

Autoregressive integrated moving average (ARIMA) is a classical model of time series prediction and the most commonly used model to fit stationary series. George [6] introduced the ARIMA model in detail; Le Jian et al. [7] used ARIMA to study the influence of meteorological factors on the concentration of submicron particles (UFP) and particulate matter 1.0 (PM1.0) under busy traffic conditions; Khashei [8] and others improved the ARIMA model, proposed the FARIMA based on the ARIMA model. With the continuous development of machine learning, some researchers combine statistical methods with artificial intelligence algorithms. For example, the ARIMA is combined with the neural network model. Li [9] and others combined MGM and BP neural network with ARIMA to construct MGM-ARIMA and BP-ARIMA to predict coal consumption; Based on combinational prediction, Liu et al. [10] used ARIMA, ANNS and EMS to predict the time series data of PM2.5 concentration; Zhu [11] proposed a new technique for PM2.5 concentration prediction based on ARIMA and improved BP neural network; Wang et al. [12] proposed an HDIPSO prediction algorithm based on neural networks, which adopted a new speed updating strategy and mutation operation to improve convergence and increase group diversity.

With the continuous development of deep learning in recent years, the prediction model based on deep learning has been widely applied. The prediction model based on Long Short-Term Memory (LSTM) [13] is the main one. Hochreiter and Schmidhuber [14] proposed the LSTM network in 1997 to solve the problem of gradient disappearance; Pathan Refat Khan [15] et al. used LSTM to predict the gene mutation rate of novel Coronavirus (COVID-19); Jun Hu et al. [16] proposed an LSTM model with transformation mechanism based on LSTM, and designed an adaptive course learning mechanism; Bai et al. [17] proposed a set of long-term and short-term memory neural network (E-LSTM), which has better predictive performance than using a single LSTM; Qi [18] proposed a hybrid model based on deep learning, which combined the graphic convolutional network with Long Short-Term Memory (GC-LSTM) for prediction; Karim [19] used the LSTM sub-module to enhance the whole convolutional network for time series classification, and proposed the ALSTM-FCN. The combination of models has become a popular research direction for the prediction of SO₂ and other pollutants concentration. U. Brunelli et al. [20] proposed a predictor based on a recurrent neural network (Elman model) and used this predictor to perform daily analysis on Palermos SO₂, O₃, PM10 and other pollutants. For predicting the maximum concentration, RMSE, MAE and MSE are used as evaluation indicators for the effectiveness of the model; Bingyue Pan[21]used the XGBoost algorithm to analyze the air quality monitoring data in Tian**. At the same time, it predicted the PM2.5 concentration and compared it with other well-known data mining models; Tecer et al. [22] first applied artificial neural network to SO₂ concentration prediction. They confirmed that artificial neural networks can be effectively used for air quality analysis and prediction.

Linliang Zhang et al. [23] proposed a method for predicting SO₂ concentration based on fuzzy time series and support vector machine (SVM). Taking the one-hour average SO₂ concentration as sample data, an SO₂ concentration prediction model was established; Shams et al. [24]recently studied the performance comparison of artificial neural network (ANN) and multiple linear regression (MLR) in predicting SO₂ concentration. Their research shows the importance of artificial neural network modeling and application in reducing urban pollution. The starting point of our article is the same as theirs, and the purpose is to control and reduce pollutant emissions. The evaluation index of the selected model is also the same. The difference is that the LSTM network used in this article needs to be updated relative to the network they use. **ang Li [25] et al. proposed the LSTME model and extended the LSTM model to capture the long-term temporal and spatial correlation of air pollutant concentrations. The model was compared with the traditional RNN model and showed better prediction performance.

By analyzing the results of many researchers in the field of multivariate time-series data analysis and prediction, it is found that although the analysis and prediction of time-series data have achieved remarkable results, there are still some limitations and also several great challenges that have been existing in the analysis and prediction of multivariate time-series data nowadays:

(1)
The diversity of time series data. Changes in one sequence of time series data are often determined by multiple sequences, and at the same time, changes in one sequence often affect changes in multiple sequences. This complicates the data analysis process.
(2)
Timing of time series data. Time series data is data that changes with time and often has a change rule within a certain period of time, which greatly affects the overall analysis of the data by the model.
(3)
Instability of time series data. Time series data are often real data sets, and most of the data contain abnormal information such as missing values and data mutation. How to deal with these abnormal information is also a great challenge for multivariate time series data prediction.

This paper mainly analyzes and predicts based on the above characteristics of time series data. First, the data set is preprocessed to solve the missing value outliers and other information in the data, and secondly, relevant variables are filtered through feature extraction to remove redundant redundant information. Finally, the ARIMA model and LSTM network are used to analyze and predict the time series data, and the experimental results prove the effectiveness of the proposed model.

3 Data Processing and Modeling

This chapter consists of two parts. The first part is data processing; the second part is modelling. The first part introduces the process of data processing in detail; the second part introduces the structure of AR-LSTM model in detail. Table 1 clearly describes the data set used in our experiment.

Table 1 Detail of the datasets

Full size table

3.1 Data Processing

3.1.1 Missing Value Processing

To ensure the integrity of the time series, we use the forward filling method to process the missing values and fill the missing values with the values before the missing value.

3.1.2 Feature Extraction

Through the observation of historical data of waste incineration plants, it is found that many related factors affect the concentration of SO₂ emissions. To ensure the prediction accuracy of AR-LSTM model, we first used a feature selection algorithm to screen out related factors with high correlation degree with SO₂ emission concentration from a variety of related factors. Pearson correlation coefficient (Pearson) and recursive feature elimination algorithm (RFE) are selected for feature selection.

(1)
Pearson correlation coefficient

Pearson correlation coefficient (Pearson) measures the degree of correlation between two variables, the calculated range of correlation coefficient is [− 1, 1]. The closer the correlation coefficient is to 1 or − 1, the stronger the correlation is. The closer the correlation coefficient is to 0, the weaker the correlation is [26], 27. Essentially, Pearson correlation coefficient is measured by calculating the ratio between the covariance of two objects and the standard deviation. The formula is as follows:

$$ \rho_{{{\text{x}},y}} = \frac{{\sum\nolimits_{i = 1}^{N} {\left( {x_{i} - \overline{x}} \right)\left( {y_{i} - \overline{y}} \right)} }}{{\sqrt {\sum\nolimits_{i = 1}^{N} {(x_{i} - \overline{x})^{2} } } \sqrt {\sum\nolimits_{i = 1}^{N} {(y_{i} - \overline{y})^{2} } } }} $$

(1)

where x and y represent two attribute objects, and N represents all attribute objects.

Pearson correlation coefficient will be used to analyze the correlation coefficient between export SO₂ emission concentration and other relevant influencing factors, calculate the Pearson correlation coefficient between SO₂ emission concentration and various influencing factors, and select the characteristic value according to the absolute value of the correlation coefficient. Tables 2, 3 show the Pearson correlation coefficient analysis results.

Table 2 Inlet correlation coefficient

Full size table

Table 3 Export correlation coefficient

Full size table

According to the Pearson correlation coefficient calculation of SO₂ emission concentration and other factors related to each link, the exportation concentration of SO₂ has a high correlation with the exportation concentration of CO, O₂, inlet coal feeder, inlet feeder, inlet secondary fan, inlet cloth bag dust removal pressure difference, furnace fault temperature and furnace top temperature. In the following part of this paper, Recursive feature elimination algorithm (RFE) is used for secondary feature selection. The final feature to be used is determined by combining the results of the two feature selections.

(2)
Recursive feature elimination algorithm

Recursive feature elimination algorithm is a greedy algorithm for finding optimal subsets. Given a set of feature, recursive feature elimination reduces the scope of the feature set through recursion again and again until the desired number of features is reached. Then the remaining feature quantities are selected. The algorithm process is as follows:

1.
Training classifier;
2.
Calculate the importance of each feature;
3.
Eliminate features of low importance;
4.
Train again until the number of features reaches the set value;

This work uses RFECV to perform RVE and uses the classic SVM-RFE algorithm for 3-fold cross-validation for feature selection. The dataset used in this work contains a total of 17 related variables, that is, 17 features, so there are 2¹⁷–1 feature subsets in total. Perform multiple rounds of training through the SVM algorithm, calculate the validation error of all feature subsets, and select the features contained in the subset with the smallest error rate until the number of features meets our requirements. Table 4 shows the selection results of features.

Table 4 Recursive feature filter table

Full size table

Through the Recursive Feature Elimination algorithm to analyze the data of the three lines A, B and C of the waste incineration plant, it is found that coal feeder, feeder, CO concentration, O₂ concentration, inlet secondary fan, dust bag pressure difference and furnace fault temperature are of relatively high importance. After Pearson and Recursive Feature Elimination algorithm (RFE) extracted features from the historical data, six features (coal feeder, feeder, secondary fan, dust bag pressure difference, O₂ concentration, CO concentration) are selected as the input of the next AR-LSTM prediction model.

3.2 AR-LSTM Model

The AR-LSTM model diagram constructed is shown in Fig. 1.

3.2.1 ARIMA

The ARIMA, called the Autoregressive Integrated Moving Average, is famous as a time series prediction model. The model’s basic idea is that the historical data sequence formed over time is regarded as a random sequence, and the variation trend of the sequence can be approximated by some appropriate mathematical model. Once the model is established, the existing historical information can be used to predict the unknown information of the future moment [28]. According to whether the original sequence is stable or not and the part contained in the regression is different, including the moving average process (MA (q)), autoregression process (AR (p)), autoregression moving average process (ARMA (p, q)), autoregression integrated moving average mixed process (ARIMA (p, d, q)). Among them, AR is autoregressive, and p is the number of autoregressive items; MA is the moving average, q is the number of moving average items, and d is the number of different times made when the time series becomes stationary [29].

ARIMA (p, d, q) can be expressed as follows:

$$ w_{t} = \Delta^{d} x_{t} = \left( {1 - L} \right)^{d} x_{t} $$

(2)

$$ w_{t} = \phi_{1} w_{t - 1} + \phi_{2} w_{t - 2} + \cdots + \phi_{p} w_{t - p} + \mu_{t} + \theta_{1} \mu_{t - 1} + \cdots + \theta_{q} \mu_{t - q} $$

(3)

In Eq. (2), d is the number of differences to transform a non-stationary series into a stationary series. $\Delta$ represents difference operator, $\Delta^{d} x_{t}$ represents the d-order difference sequence.

In Eq. (3), $w_{t}$ represents the current value, $\phi$ represents the autocorrelation coefficient, $\mu$ represents white noise.

The prediction of ARIMA is divided into the following steps:

(1)
Obtain time series data;
(2)
Draw a trend chart according to the data to observe whether it is a stationary series. For non-stationary time series, the d-order difference operation should be carried out first to transform it into stationary time series. If it is stationary sequence, ARMA (p, q) model is used directly.
(3)
After processing the series into stationary time series, the ACF and partial autocorrelation coefficient PACF are obtained. By analyzing autocorrelation and partial autocorrelation, the optimal level p and order q are obtained.
(4)
Based on d, q and p obtained above, ARIMA model is obtained. Then the existing historical series data are used to test the model to ensure that the actual and the predicted error values are controlled within the set threshold. If the error does not achieve the expected effect, return (2).

Among them:

$$ ACF\left( {\text{k}} \right) = \rho_{k} = \frac{{Cov\left( {y_{t} ,y_{t - k} } \right)}}{{Var\left( {{\text{y}}_{t} } \right)}} $$

(4)

In Eq. (4), $y_{t}$ represents the value at time t,$y_{t - k}$ represents the value at time t − k.

3.2.2 LSTM

The LSTM neural network is called the Long Short-Term Memory. LSTM was first put forward by Hochdirect-Schmidhuber, a special circulating neural network for solving the gradient disappearance of recurrent neural network (RNN). This network proposes a node structure different from that of ordinary neurons and introduces the concept of control gate [30,31,32]. LSTM network is suitable for classification, processing and prediction of time series data. The proposed node is called the LSTM cell, and its structure is shown in Fig. 2.

LSTM unit [33], 34 is divided into four parts: input, control signal of input gate, control signal of forgetting gate and control signal of output gate. The input gate controls whether write is allowed, the forgetting gate controls whether the value of the memory unit needs to be updated, and the output gate controls whether output is allowed. The internal structure diagram is shown in Fig. 3.

Z is the input, and Z_i is the control signal of the input gate. Z_f represents the control signal of the forgetting gate. Z_o represents the control signal of the output gate. The f(x) function is usually Sigmod:

$$ f\left( x \right) = \frac{1}{{1 + e^{ - x} }} $$

(5)

This function is controlled within the range of [0, 1] and can indicate the opening degree of the door. g(x) and h(x) are activation functions. First, Z is activated to get g(Z), Z is activated to get f (Z_i) by Sigmod and then multiplied to get g(Z) f(Z_i). Z_f is obtained by Sigmod function to obtain f(Z_f) and then multiplied by the value an of the memory unit existing at the last time to obtain cf (Z_f). Then, the value of memory unit is updated to:

$$ c^{^{\prime}} = g\left( Z \right)f\left( {Z_{i} } \right) + cf\left( {Z_{f} } \right) $$

(6)

c′ gets h(c′) after activation function; Z_o gets f(Z_o) through Sigmod function, multiply to get the output:

$$ a = h\left( {c^{^{\prime}} } \right)f\left( {Z_{o} } \right) $$

(7)

The input X_t at time t and the output h_t-1 of the hidden layer neuron at time t − 1 are used together as the input part of the hidden layer at time t. They are multiplied by different weight vectors respectively. After activation function, the control signals Z_f, Z_i, Z_o and the input value Z of the three gates are obtained. The formula is as follows:

$$ Z_{f} = \omega_{f} \left[ {h_{t - 1} ,x_{t} } \right] + b_{f} $$

(8)

$$ Z_{i} = \omega_{i} \left[ {h_{t - 1} ,x_{t} } \right] + b_{i} $$

(9)

$$ Z_{o} = \omega_{o} \left[ {h_{t - 1} ,x_{t} } \right] + b_{i} $$

(10)

$$ Z = \omega_{x} \left[ {h_{t - 1} ,x_{t} } \right] + b_{x} $$

(11)

Among them,$\omega$ represents the weight vector, $b$ represents the activation amount.

Where b_f, b_i, b_o and b_x are respectively the bias of different connection weights. After the operation of LSTM unit, the value c of the memory unit is updated (7) to get:

$$ c^{\prime } = g\left( {\omega_{x} \left[ {h_{t - 1} ,x_{t} } \right] + b_{x} } \right)f\left( {\omega_{i} \left[ {h_{t - 1} ,x_{t} } \right] + b_{i} } \right) + cf\left( {\omega_{f} \left[ {h_{t - 1} ,x_{t} } \right] + b_{f} } \right) $$

(12)

The output of hidden layer neurons is:

$$ \begin{aligned} h_{t} & = h\left( {c^{\prime } } \right)f\left( {Z_{o} } \right) \\ & { = }h\left( {g\left( {\omega_{x} \left[ {h_{t - 1} ,x_{t} } \right] + b_{x} } \right) \cdot f\left( {\omega_{i} \left[ {h_{t - 1} ,x_{t} } \right] + b_{i} } \right) + cf\left( {\omega_{{\text{f}}} \left[ {h_{t - 1} ,x_{t} } \right] + b_{f} } \right)} \right) \\ & \quad \cdot f\left( {\omega_{o} \left[ {h_{t - 1} ,x_{t} } \right] + b_{o} } \right) \\ \end{aligned} $$

(13)

From the above, it can be seen that the input of LSTM not only includes the output h_t−1 of the neuron in the hidden layer at the previous time, and includes the value of the memory unit in the LSTM unit. LSTM network can effectively avoid the occurrence of gradient disappearance, can remember long-term historical information, and more effectively fit long-term time series data [35]. Therefore, LSTM neural network is widely used in speech recognition, image recognition, sequence prediction and other aspects and has achieved good results.

3.2.3 AR-LSTM

The AR-LSTM model based on ARIMA model and LSTM neural network is used to predict the concentration of SO₂. On the one hand, this model uses ARIMA model to predict according to the linear relationship between time series data and SO₂ concentration data. On the other hand, LSTM neural network is used for prediction according to the results of 3.1.2 feature extraction. Finally, the CRITIC weight assignment method is used to calculate the weights of the budget results of the two models separately. The prediction results of the two models are combined and processed.

Among them, the CRITIC assignment method is an objective weight assignment method. The basic idea is to determine the objective weight of the index based on contrast and conflic. The conflict between the j_th indicator and other indicators is as follows:$\sum\limits_{{{\text{k}} = 1}}^{{\text{k}}} {\left( {1 - {\text{r}}_{k,j} } \right)}$, j = 1, 2, 3, …, J. Let C_j be the information contained in the j_th evaluation index, and C_j is expressed as:

$$ C_{j} = \sigma_{j} \sum\limits_{k = 1}^{k} {\left( {1 - r_{k,j} } \right)} ,\quad {\text{ j}} = {1},{2},{3}, \ldots ,{\text{J}} $$

(14)

Among them, $\sigma_{{\text{j}}}$ is the standard deviation of the j_th index; ${\text{r}}_{k,j}$ is the correlation coefficient between the evaluation index k and j. W_j is:

$$ w_{j} = \frac{{C_{j} }}{{\sum\nolimits_{j = 1}^{j} {C_{j} } }},\quad j = 1,2,3, \cdots J $$

(15)

The AR-LSTM model consists of input layer, hidden layer and output layer. The input vectors X₁ and X₂ are respectively time series data and SO₂ concentration data. Input vector (X₃, … X₉) is six eigenvalues and SO₂ concentration data. The activation function uses ELU function, with full connection between each layer, and the final output is Y_t, that is the concentration of SO₂. The training model optimizer adopts Adam optimizer.

4 Experiment and Analysis

In this paper, this experiment uses Python 3.7 programming language, TensorFlow 1.13.1 deep learning framework and Keras library.

4.1 ARIMA modeling

Before modelling with ARIMA model, the stationarity test of the data set should be carried out. If the test result is stationary series, no processing is required; if the test result is non-stationary series, differential processing of the data set is required. We use the unit root test (ADF) to determine whether the data set is stationary series. Table 5 shows the unit root test results of the two data sets.

Table 5 ADF test results

Full size table

It can be seen from Table 5 that the ADF statistics of both data sets are more significant than the three critical values. Therefore, both data sets are non-stationary sequences. The first difference method is used to realize the stability of two data sets.

According to the data after the first-order difference, the ARIMA model is determined by autocorrelation function (ACF) and partial autocorrelation function (PACF). After calculation, hourly predictions are made using the ARIMA (2, 1, 0) model, and day-by-day predictions are made using ARIMA (1,1,0) model. Figure 4 shows hourly and day-by-day prediction results. Where blue is the true value and orange is the predicted value.

4.2 LSTM Modeling

Multiple features and characteristic factors extracted in 3.1.2 feature extraction process (including six characteristics: inlet coal feeder, inlet feeder, inlet secondary fan, inlet cloth bag dust removal pressure difference, outlet O₂ concentration and outlet CO concentration) are taken as the input of the LSTM neural network. The output of this network is the concentration of SO₂. Considering that the experimental data used is small and the training of multi-layer neural network is easy to overfit, in order to avoid the phenomenon of overfitting, the LSTM neural network with single-layer hidden layer structure is used for prediction. The network loss function adopts Mean Absolute Error(MAE) loss function and the optimization algorithm adopts Adam optimization algorithm. In order to further determine the number of neurons in the hidden layer, experiments are carried out under four conditions of 25, 50, 75 and 100 neurons respectively. The judgment indexes were mean square error (RMSE), the coefficient of determination (R²) and average absolute error (MAE). The comparison results are shown in Table 6.

Table 6 Comparison table of different neuron number

Full size table

Among them, R² is the coefficient of determination, which is the index to evaluate the good or bad fitting of the model. It is most commonly used to evaluate the regression model. The closer the value is to 1, the better the fitted equation is. RMSE is the root mean square error, which is used to detect the deviation between the predicted value and the actual value of the model. RMSE is often used as the measurement standard of machine learning model prediction results. MAE is the mean absolute error, the mean of the absolute error, which can better reflect the true situation of the predicted value error.

As shown in Table 6, with the increase in the number of neurons, the error does not necessarily decrease but may increase. When the number of neurons is 50, the error is minimal. Therefore, the number of neurons in the hidden layer of LSTM neural network was set as 50. The model uses 50 epochs, and each batch size is 72 for training. Figure 5 shows the loss function value diagram during the training of the LSTM neural network.

After the LSTM neural network training was completed, the hourly and daily predictions are carried out on the two test sets, respectively. The predicted values of SO₂ concentration are outputted and visualized. The black curve is the predicted value, and the red curve is the actual value. The hourly and daily prediction results of the LSTM neural network model are shown in Fig. 6.

4.3 AR-LSTM Modeling

We aim to use AR-LSTM model based on ARIMA and LSTM neural network to predict SO₂ concentration. In the prediction process of this model, the ARIMA model and the LSTM neural network are processed in parallel, and the time series prediction and multi-dimensional correlation factor prediction are carried out on the same data set. Considering that the experimental data are the real data of a garbage incineration plant, the single use of ARIMA model for time series prediction and the use of LSTM neural network for multi-dimensional correlation factor prediction can not achieve the ideal prediction effect. Therefore, the AR-LSTM model in this article uses two models to predict the SO₂ concentration respectively. The CRITIC weight assignment method then calculates the weights of the two models, where the weight range is [0, 1]. According to the weight value, the prediction results of the two models are processed to get the final prediction results. This article makes hourly forecasts at [‘2019/9/1 00:00’–‘2019/9/14 3:00’]; makes daily forecasts on the [‘2019/9/1’–‘2019/9/24’]. Figures 7, 8 respectively show the comparison results of four different models.

The values of the ARIMA model, the LSTM neural network model and the AR-LSTM model in hourly and daily prediction are calculated and compared using these three evaluation indexes. The comparison results of R², RMSE and MAE of each model are shown in Table 7.

Table 7 Evaluation index result table

Full size table

It can be seen from Table 7 that the three indexes of the AR-LSTM model are superior to the single ARIMA model and the LSTM neural network model, whether it is forecast hour by hour or day by day. The indicators of LSTM-RNN are close to AR-LSTM, but AR-LSTM is still slightly better. The hourly prediction error of all three models is lower than the daily prediction. The training data of hourly prediction is more than that of daily prediction, which leads to the higher training accuracy of AR-LSTM model in hourly prediction. To sum up, the effectiveness of AR-LSTM model is proved. On this basis, we use this model to predict SO₂ emissions in the next few weeks, and the prediction results are shown in Fig. 9.

5 Conclusions

AR-LSTM model is constructed to predict the concentration of SO₂ hourly and daily and predict the emission of SO₂ in the next few weeks. The process of data processing mainly includes three aspects: missing value processing, data marking and feature extraction; Then, the ARIMA model, LSTM model, LSTM-RNN model and the AR-LSTM model are respectively carried out prediction experiments on the same data set. Finally, R², RMSE and MAE are selected as the evaluation indexes to evaluate each model, and each model’s prediction results are visually compared. Based on the analysis of the predicted results and the comparative experimental results, the following conclusions can be drawn:

(1)
Judging from the prediction results, the accuracy of hourly prediction is higher than that of daily prediction. This is because the amount of data predicted hour by hour is much greater than that predicted day by day. In the process of hourly prediction, the model is better trained, and the data is better fitted. The training of daily prediction is relatively more minor, which leads to the low accuracy of model prediction.
(2)
According to the comparative experimental results, the coefficient of determination (R²) of AR-LSTM model is higher than that of the single ARIMA model and the LSTM neural network whether the prediction is on an hourly or a daily basis. The root mean square error (RMSE) and mean absolute error (MAE) of AR-LSTM model is smaller than those of ARIMA model and LSTM neural network. The indicators of LSTM-RNN are close to AR-LSTM, but AR-LSTM is still slightly better. This proves the effectiveness of the AR-LSTM combination model to some extent.
(3)
Finally, after determining the effectiveness of the AR-LSTM model, we use this model to predict the SO₂ emissions in the next few weeks and visualized the predicted results.

References

Zhao X-G, Jiang G-W, Li A, Li Y (2016) Technology, cost, a performance of waste-to-energy incineration industry in China. Renew Sustain Energy Rev 55:115–130
Article Google Scholar
Li Y, Zhao X, Li Y et al (2015) Waste incineration industry and development policies in China. Waste Manag 46:234–241
Article Google Scholar
Lu JW, Zhang S, Hai J et al (2017) Status and perspectives of municipal solid waste incineration in China: a comparison with developed regions. Waste Manag 69:170–186
Article Google Scholar
Sears TM, Thomas GE, Carboni E et al (2013) SO₂ as a possible proxy for volcanic ash in aviation hazard avoidance. J Geophys Res 118(11):5698–5709
Article Google Scholar
Katzoff M, Fuller WA (1997) Introduction to statistical time series (2nd edn), vol 92. Journal of the American Statistical Association, Boca Raton
Box GEP, Jenkins GM (1976) Time series analysis forecasting and control. Holden-Day Press, San Francisco
MATH Google Scholar
Jian L, Zhao Y, Zhu Y et al (2012) An application of ARIMA model to predict submicron particle concentrations from meteorological factors at a busy roadside in Hangzhou, China. Sci Total Environ 426:336–345
Article Google Scholar
Khashei M, Rafiei FM, Bijari M et al (2013) Hybrid fuzzy auto-regressive integrated moving average (FARIMAH) model for forecasting the foreign exchange markets. Int J Comput Intell Syst 6(5):954–968
Article Google Scholar
Li S, Yang X, Li R et al (2019) Forecasting coal consumption in India by 2030: using linear modified linear (MGM-ARIMA) and linear modified nonlinear (BP-ARIMA) combined models. Sustainability 11(3):695
Article MathSciNet Google Scholar
Liu D, Li L (2015) Application study of comprehensive forecasting model based on entropy weighting method on trend of PM2.5 concentration in Guangzhou, China. Int J Environ Res Public Health 12(6):7085–7099
Article Google Scholar
Zhu H, Lu X (2016) The prediction of PM2.5 value based on ARMA and improved BP neural network model. In: Intelligent networking and collaborative systems. pp 515–517
Wang X, Ma L, Wang B et al (2013) A hybrid optimization-based recurrent neural network for real-time data prediction. Neurocomputing 120:547–559
Article Google Scholar
Greff K, Srivastava RK, Koutnik J et al (2017) LSTM: a search space odyssey. IEEE Trans Neural Netw 28(10):2222–2232
MathSciNet Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-time memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Pathan RK, Biswas M, Khandaker MU (2020) Time series prediction of COVID-19 by mutation rate analysis using recurrent neural network-based LSTM model. Chaos Solitons Fractals 138:110018
Article Google Scholar
Jun H, Zheng W (2020) A deep learning model to effectively capture mutation information in multivariate time series prediction. Knowl-Based Syst 203:106139
Article Google Scholar
Bai Y, Zeng B, Li C et al (2019) An ensemble long short-term memory neural network for hourly PM2.5 concentration forecasting. Chemosphere 222:286–294
Article Google Scholar
Qi Y, Li Q, Karimian H, et al (2019) A hybrid model for spatiotemporal forecasting of PM2.5 based on graph convolutional neural network and long short-term memory. Sci Tot Environ 1–10
Karim F, Majumdar S, Darabi H et al (2018) LSTM fully convolutional networks for time series classification. IEEE Access 6:1662–1669
Article Google Scholar
Brunelli U, Piazza V, Pignato L et al (2007) Two-days ahead prediction of daily maximum concentrations of SO₂, O3, PM10, NO₂, CO in the urban area of Palermo, Italy. Atmos Environ 41(14):2967–2995
Article Google Scholar
Pan B (2018) Application of XG Boost algorithm in hourly PM2.5 concentration prediction. In: IOP conference series: earth and environmental science, vol 113. p 012127
Tecer LH (2007) Prediction of SO2 and PM concentrations in a coastal mining area (Zonguldak, Turkey) using an artificial neural network
Zhang L, Li Z, Ma Y, et al (2016) Prediction of SO₂ concentration based on fuzzy time series and support vector machine. In: 2016 international conference on sensor network and computer engineering. Atlantis Press
Shams SR, Jahani A, Kalantary S et al (2021) The evaluation on artificial neural networks (ANN) and multiple linear regressions (MLR) models for predicting SO₂ concentration. Urban Clim 37:100837
Article Google Scholar
Li X, Peng L, Yao X et al (2017) Long short-term memory neural network for air pollutant concentration predictions: method development and evaluation. Environ Pollut 231:997–1004
Article Google Scholar
Ma X, Qin H (2020) A new parameter reduction algorithm for interval-valued fuzzy soft sets based on Pearson’s product moment coefficient. Appl Intell
Baak M, Koopman R, Snoek H, Klous S (2020) A new correlation coefficient between categorical, ordinal and interval variables with Pearson characteristics. Comput Stat Data Anal 152:107043
Article MathSciNet MATH Google Scholar
Kırbaş İ, Sözen A, Tuncer AD, Kazancıoğlu FŞ (2020) Comparative analysis and forecasting of COVID-19 cases in various European countries with ARIMA, NARNN and LSTM approaches. Chaos Solitons Fract 138:110015
Article Google Scholar
Li Z, Li Y (2020) A comparative study on the prediction of the BP artificial neural network model and the ARIMA model in the incidence of AIDS. BMC Med Inf Decis Mak 20(1):1–13
Google Scholar
Sheng X, Yi Q, Caichao Z, Yangyang W, Haizhou C (2020) LSTM networks based on attention ordered neurons for gear remaining life prediction. ISA Trans 106:343–354
Article Google Scholar
Ombabi AH, Ouarda W, Alimi AM (2002) Deep learning CNN–LSTM framework for Arabic sentiment analysis using textual information shared in social networks. Soc Netw Anal Min 10:1–13
Google Scholar
**gming L, Minghui G, Wei G, et al. (2020) Dose regulation model of norepinephrine based on LSTM network and clustering analysis in sepsis. 13(1)
Lei Z, Chenbo X, Yihua G, Yi H, **aojiang D, Zhihong T (2020) Improved Dota2 lineup recommendation model based on a bidirectional LSTM. Tsinghua Sci Technol 25(06):712–720
Article Google Scholar
Shuai G, Yuefei H, Shuo Z, **gcheng H, Guangqian W, Meixin Z, Qingsheng L (2020) Short-term runoff prediction with GRU and LSTM networks without requiring time step optimization during sample generation. J Hydrol 589:125188
Article Google Scholar
Hu-Wen W, ** Z, **g Z (2020) Urban rail transit operation safety evaluation based on an improved CRITIC method and cloud model. J Rail Transp Plan Manag
Williams BM, Hoel LA (2003) Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: theoretical basis and empirical results. J Transp Eng 129(6):664–672
Article Google Scholar
Lin L, Wang F, **e X (2017) Random forests-based extreme learning machine ensemble for multi-regime time series prediction. Exp Syst Appl. https://doi.org/10.1016/j.eswa.2017.04.013
Article Google Scholar
Ding N, Benoit C, Foggia G et al (2016) Neural network-based model design for short-term load forecast in distribution systems. IEEE Trans Power Syst 31(1):72–81
Article Google Scholar
Kim M, Choi W, Jeon Y et al (2019) A hybrid neural network model for power demand forecasting. Energies 12(5):931
Article Google Scholar
Gers FA, Schmidhuber J, Cummins F et al (2000) Learning to forget: continual prediction with LSTM. Neural Comput 12(10):2451–2471
Article Google Scholar
Zhang Y, Yamaguchi R, Imoto S et al (2017) Sequence-specific bias correction for RNA-seq data using recurrent neural networks. BMC Genomics 18(1):1044–1044
Article Google Scholar
Senturk U, Polat K, Yucedag I (2020) A non-invasive continuous cuffless blood pressure estimation using dynamic recurrent neural networks. Appl Acoust 170:107534
Article Google Scholar

Download references

Acknowledgements

We are grateful for the support of the National Natural Science Foundation of China 61772321, National Natural Science Foundation of Shandong ZR202011020044.

Author information

Authors and Affiliations

School of Information Science and Engineering, Shandong Normal University, **an, 250358, China
Jie Ju & Fang’ai Liu
Huawei Technologies Co., Ltd., Shenzhen, China
Ke’nan Liu

Authors

Jie Ju
View author publications
You can also search for this author in PubMed Google Scholar
Ke’nan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Fang’ai Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fang’ai Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ju, J., Liu, K. & Liu, F. Prediction of SO₂ Concentration Based on AR-LSTM Neural Network. Neural Process Lett 55, 5923–5941 (2023). https://doi.org/10.1007/s11063-022-11119-7

Download citation

Accepted: 10 December 2022
Published: 24 December 2022
Issue Date: October 2023
DOI: https://doi.org/10.1007/s11063-022-11119-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Prediction of SO₂ Concentration Based on AR-LSTM Neural Network

Abstract

Similar content being viewed by others

Analysis and Prediction of Air Pollution in Assam Using ARIMA/SARIMA and Machine Learning

Modeling and Prediction of Meteorological Parameters Using the Arima and LSTM Methods: Sivas Province Case

Air pollution prediction system using XRSTH-LSTM algorithm

1 Introduction

2 Related Work