XGB-SEGA coupled energy saving method for wastewater treatment plants

Wang, Zijian; Zhou, Xuwen; Wang, Hao; Huang, Zehan; Ji, Jiantao; Peng, Zhaoxu; Jiang, Kun

doi:10.1007/s13201-023-02081-3

XGB-SEGA coupled energy saving method for wastewater treatment plants

Original Article
Open access
Published: 23 January 2024

Volume 14, article number 29, (2024)
Cite this article

Download PDF

You have full access to this open access article

Applied Water Science Aims and scope Submit manuscript

XGB-SEGA coupled energy saving method for wastewater treatment plants

Download PDF

Zijian Wang^1,2,
Xuwen Zhou^3,2,
Hao Wang¹,
Zehan Huang⁴,
Jiantao Ji⁵,
Zhaoxu Peng² &
…
Kun Jiang⁶

510 Accesses
Explore all metrics

Abstract

To reduce the amount of energy consumed in wastewater treatment plants, nine methods were used to select the key operation parameters that affected energy consumption according to daily operation records, and an intelligent operation management system based on a genetic algorithm was constructed by map** the relationships between energy consumption and the key operation parameters. The results showed that the prediction and management of energy consumption could be achieved by incorporating the strengthened elastic genetic algorithm into the extreme gradient boosting model. The main parameters affecting energy consumption were the influent flow rate, effluent total nitrogen, NH₄⁺–N loading rate, etc., and the energy consumption could be reduced by 13–27% (with an average of 22%). The parameters were all selected from the daily operation records of the wastewater treatment plant, and no additional complex data acquisition system was needed to collect specific parameters. This study provided a cost-effective strategy to reduce energy consumption in wastewater treatment plants.

Intelligent automation and IT for the optimization of renewable energy and wastewater treatment processes

Article Open access 17 September 2014

Electrical energy recovery from wastewater: prediction with machine learning algorithms

Article 03 December 2022

Maximised bioethanol extraction from bamboo biomass through alkali pretreatment and enzymatic saccharification by application of ANN-NSGA-II-based optimisation method

Article 30 November 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

With obvious climate change occurring during the last decade, reducing carbon emissions is an urgent need (Yang et al. 2022). Wastewater treatment is one of the most energy-consuming industries in China; it consumed 18.4 billion kWh in 2020, and its consumption level continues to increase every year (Chang et al. 2021). Large amounts of greenhouse gases (GHGs), such as CO₂, N₂O, and CH₄, are generated during wastewater treatment and have been identified as anthropogenic sources of GHG emissions (Yoshida et al. 2014). Among the total energy consumption of wastewater treatment plants (WWTPs), the electricity consumed during aeration accounts for 70–80% of this amount, followed by pumps and chemicals (Yang et al. 2021). At present, the aeration control process in Chinese WWTPs is slipshod and mostly manual, and excess aeration is always supplied, which generates unnecessary energy consumption (Yang et al. 2021), especially in small-scale WWTPs. In most European WWTPs, the SCADA system has been installed to precisely control the aeration process. However, most Chinese WWTPs are only equipped with facilities to check whether the effluent meets the imposed discharge standards. Currently, as discharge standards become stricter, the requirement for energy savings is increasing. It is necessary to determine the energy savings potential of the existing equipment, as well as meet these standards.

During the last decade, with the rapid development of information technologies such as big data and artificial intelligence (AI), new energy saving solutions have been provided by constructing intelligent management systems (Wang et al. 2022). Machine learning (ML) is an AI technology that is used to recognize specific patterns and provide necessary data for prediction or classification. Benefiting from its high precision based only on data relationships, ML is becoming more popular in various fields, such as effluent prediction and process optimization (Picos-Benítez et al. 2020). Since wastewater treatment is a nonlinear process, it is often difficult to construct simple models, while a data-driven approach based on ML is preferable (Bagherzadeh et al. 2021). Many studies have verified the feasibility of ML in wastewater treatment tasks. Nourani et al. (2021) proposed an approach based on black-box models, including a feedforward neural network, support vector regression (SVR) and an adaptive neuro-fuzzy inference system, to predict effluent biological oxygen demand (BOD₅) and chemical oxygen demand (COD). Ly et al. (2022) investigated and compared six ML algorithms for predicting effluent total phosphorus (TP). El-Rawy et al. (2021) compared the performance of different models for predicting the removal efficiencies of total suspended solids, COD, BOD₅, and NH₄⁺–N (ammonia nitrogen). Wan et al. (2022) proposed a model based on a convolutional neural network, weight-sharing long short-term memory and Gaussian process regression for paper-making wastewater treatment, which exhibited a comprehensive forecasting ability. Das et al. (2021) used standard mean absolute error (MAE) and root mean square error (RMSE) metrics as evaluation indices to compare four ML algorithms, and a gated recurrent unit was selected as the best model. Żyłka et al. (2020) evaluated the least-squares linear regression model for predicting electricity consumption and found that the main parameters were the organic loading rate and temperature.

However, to date, research using ML in wastewater treatment scenarios has mainly focused on effluent quality, while few studies has been conducted on energy consumption. In addition, some factors for determining energy consumption have not been considered, and their influence has not yet been fully evaluated; thus, model performance is strongly determined by the accuracy of the input data.

In this study, a map** relationship between energy consumption and management parameters was established, and an energy-saving strategy for WWTPs was developed based on a genetic algorithm. Daily operation parameters were ranked through regression analysis, and then the top-ranking parameters were selected as inputs to establish an XGB model for predicting energy consumption. Furthermore, an energy-saving control strategy was evaluated, which is expected to offer an instant energy-saving strategy for WWTPs in practical applications.

Materials and methodology

Background of the target WWTP

An urban WWTP (an anaerobic anoxic aerobic process) located in Henan Province (China) had a designed flow rate of 30,000 m³ day⁻¹, and it followed the Chinese discharge standard of GB18918-2022. The sludge was dewatered by a belt filtering press and then treated by a sanitary landfill (Fig. 1). The operation data were collected for 353 days from 1st January to 31st December 2020, which consisted of the influent flow rate (IFR), influent NH₄⁺–N concentration (IAN), influent COD (ICOD), influent TN (ITN), influent TP (ITP), effluent NH₄⁺–N concentration (EAN), effluent COD (ECOD), effluent TN (ETN), effluent TP (ETP), mixed liquid suspended solids (MLSS) of the aerobic tank, DO at the end of the aerobic tank (DO), ORP at the end of the anoxic tank (ORP), organic loading rate (OLR), NH₄⁺–N loading rate (ANLR) and energy consumption per cubic metre (EC). The units and statistics of each feature are shown in Table 1.

Table 1 Dataset

Full size table

Feature selection

Ordinary least squares

Ordinary least squares (OLS) is a method that can be used for variable selection. For a dataset $D=\{\left({{\varvec{x}}}_{1},{y}_{1}\right),\left({{\varvec{x}}}_{2},{y}_{2}\right),\dots ,\left({{\varvec{x}}}_{n},{y}_{n}\right)\}$, where ${\varvec{x}}\in {\mathbb{R}}^{d},y\in {\mathbb{R}}$, one of its basic expressions is as follows. The fitting criterion is to minimize the sum of squared residuals between $y$ and $f\left(x\right)$. The parameters without zero coefficients are selected.

$$ \begin{array}{*{20}c} {f\left( {\varvec{x}} \right) = {\varvec{\omega}}^{T} x + \beta } \\ \end{array} $$

(1)

$$ \begin{array}{*{20}c} {\min \mathop \sum \limits_{i = 1}^{n} \left( {y_{i} - f\left( {{\varvec{x}}_{i} } \right)} \right)^{2} } \\ \end{array} $$

(2)

where ${{\varvec{\omega}}}^{T}$ and $\beta $ are undetermined coefficients.

Least absolute shrinkage and selection operator

To avoid overfitting during OLS, a penalty function with the L₁ norm is added to the objective function to simplify the structure and decrease the empirical risk in the least absolute shrinkage and selection operator (Lasso). Compared with ridge regression, which adapts the L₂ norm, the Lasso more easily obtains sparse solutions and selects features.

$$ \begin{array}{*{20}c} {\min \mathop \sum \limits_{i = 1}^{n} \left( {y_{i} - f\left( {x_{i} } \right)} \right)^{2} + \lambda {\varvec{\omega}}_{1} } \\ \end{array} $$

(3)

Smoothly clipped absolute deviation

The smoothly clipped absolute deviation (SCAD) approach was proposed by Fan and Li (2001). Compared with the Lasso, this method reduces the bias of parameter estimation. The basic principle behind the imposed penalty is similar to MCP, but the difference is the utilized transition method. The penalty function of SCAD is defined as Eq. $4$.

$$ \begin{array}{*{20}c} {p_{\lambda ,\gamma } \left( \theta \right) = \left\{ {\begin{array}{*{20}l} {\lambda \theta ,} \hfill & {\theta \le \lambda } \hfill \\ { - \frac{{\theta^{2} - 2\gamma \lambda \theta + \lambda^{2} }}{2\theta - 2},} \hfill & { \lambda < \theta \le \gamma \lambda } \hfill \\ {\frac{{\left( {\gamma + 1} \right)\lambda^{2} }}{2}, } \hfill & {\theta > \gamma \lambda } \hfill \\ \end{array} } \right.} \\ \end{array} $$

(4)

where $\lambda \ge 0$ and $\gamma >2$.

The objective function used by SCAD is as follows:

$$ \begin{array}{*{20}c} {\min \mathop \sum \limits_{i = 1}^{n} \left( {y_{i} - f\left( {x_{i} } \right)} \right)^{2} + \lambda \mathop \sum \limits_{j = 1}^{d} p_{\lambda ,\gamma } \left( {\omega_{j} } \right)} \\ \end{array} $$

(5)

C _p criterion

Mallow's C_p is used for variable selection through the sum of the residual squares obtained from an OLS regression model (James et al. 2021). The independent variable subset that minimizes C_p is selected, and the regression equation corresponding to this independent variable subset is the optimal regression equation.

$$ \begin{array}{*{20}c} {C_{{\text{p}}} = \frac{{{\text{SSR}}_{d} }}{{\hat{\sigma }^{2} }} - n + 2d} \\ \end{array} $$

(6)

where ${\widehat{\sigma }}^{2}$ is an estimate of the variance of the residuals, $d$ is the number of parameters, and $n$ is the sample size.

The above methods are based on a linear regression model. By adding a penalty item, the coefficients of the input variables are compressed to varying degrees, and this plays a role in the feature selection process.

Akaike information criterion

The Akaike Information Criterion (AIC) is widely used to evaluate the fitness levels of statistical models, and it works as follows. The first term reflects fitness, and the second term reflects the number of parameters. The best model is the one with the minimum AIC (Ingdal et al. 2019).

$$ \begin{array}{*{20}c} {{\text{AIC}} = - 2\ln \hat{L} + 2d} \\ \end{array} $$

(7)

where $\widehat{L}$ is the likelihood estimator and $d$ is the number of parameters.

Bayesian information criterion

Similar to the AIC, the Bayesian information criterion (BIC) is also a basic method for decision-making tasks involving statistical models, and it helps to strike a balance between simplicity and map** ability. The standard expression of the BIC is as follows. Different from the AIC, its second term is concerned with both the number of parameters and the sample size. The model with the minimum value achieves the best balance between simplicity and map** ability (Liu et al. 2022b).

$$ \begin{array}{*{20}c} {{\text{BIC}} = - 2\ln \hat{L} + d\ln n} \\ \end{array} $$

(8)

where $\widehat{L}$ is the likelihood estimator, $d$ is the number of parameters, and $n$ is the sample size.

Minimum redundancy–maximum relevance

The minimum redundancy-maximum relevance (mRMR) algorithm is used to solve the problem that due to the existence of redundant variables, the best eigenvalue obtained by maximizing the correlation degree between a feature and the target variable does not necessarily obtain the best prediction accuracy (Ding and Peng 2003).Mutual information is used to measure the correlation between two variables. The mutual information $I$ of two discrete random variables $x$ and $y$ is defined as follows:

$$ \begin{array}{*{20}c} {I\left( {x,y} \right) = \mathop \sum \limits_{i,j} p\left( {x_{i} ,y_{i} } \right)\log \frac{{p\left( {x_{i} ,y_{i} } \right)}}{{p\left( {x_{i} ,y_{i} } \right)}}} \\ \end{array} $$

(9)

where $p(x,y)$ is the joint probabilistic distribution of two variables $x$ and $y$, and $p(x)$ and $p(y)$ are the marginal probabilities of $x$ and $y$, respectively.

For the mRMR algorithm, the mutual information $I(x,c)$ is used to find the feature subset $S$ among the $m$ features that are most closely related to category $c$.

$$ \begin{array}{*{20}c} {\max D\left( {S,c} \right),D = \frac{1}{\left| S \right|}\mathop \sum \limits_{{x_{i} }} I\left( {x_{i} ,c} \right)} \\ \end{array} $$

(10)

The minimum redundant feature condition is:

$$ \begin{array}{*{20}c} {\min R\left( S \right),R = \frac{1}{{\left| S \right|^{2} }}\mathop \sum \limits_{{x_{i} ,x_{j} \smallint S}} I\left( {x_{i} ,x_{j} } \right)} \\ \end{array} $$

(11)

Then, the maximal correlation-minimal redundancy feature set $S$ is:

$$ \begin{array}{*{20}c} {{\text{mRMR}} = \max \left[ {\frac{1}{\left| S \right|}\mathop \sum \limits_{{x_{i} }} I\left( {x_{i} ,c} \right) - \frac{1}{{\left| {S^{2} } \right|}}\mathop \sum \limits_{{x_{i} ,x_{j} \smallint S}} I\left( {x_{i} ,x_{j} } \right)} \right]} \\ \end{array} $$

(12)

By selecting the subset of variables that minimize or maximize the target value, the redundant variables can be eliminated.

Model construction

XGB

XGB is an optimized version of boosting (Chen and Guestrin 2016). When a tree is added, a new function $f\left(x\right)$ can be obtained to fit the residual of the last prediction. Once $K$ trained trees are obtained, every tree falls to a corresponding leaf node, and every node corresponds to a score. The predicted value is the summation of the scores produced by different trees, which is calculated as follows:

$$ \begin{array}{*{20}c} {\hat{y} = \mathop \sum \limits_{k = 1}^{K} f_{k} \left( {x_{i} } \right)} \\ \end{array} $$

(13)

where $K$ is the number of trees and ${{\text{f}}}_{{\text{k}}}\left({{\text{x}}}_{{\text{i}}}\right)$ is the score of each tree.

The objective function consists of a loss function and a regularization penalty:

$$ \begin{array}{*{20}c} {{\text{Obj}}\left( \theta \right) = \mathop \sum \limits_{i = 1}^{n} l\left( {y_{i} ,\widehat{{y_{i} }}} \right) + \mathop \sum \limits_{k = 1}^{K} \Omega \left( {f_{k} } \right)} \\ \end{array} $$

(14)

where $l\left({y}_{i},{\widehat{{\text{y}}}}_{{\text{i}}}\right)$ is the error of the i-th sample and $\Omega \left({{\text{f}}}_{{\text{k}}}\right)$ is the regularization penalty term of the k-th tree.

$$ \Omega \left( {f_{k} } \right) = \alpha T + \frac{1}{2}\lambda \mathop \sum \limits_{j = 1}^{T} \omega_{j}^{2} $$

(15)

where $T$ is the total number of leaf nodes in the t-th tree, ${\omega }_{j}$ is the weight of the j-th leaf node, and $\mathrm{\alpha },\uplambda $ are scalars. $\alpha $ controls the number of leaf nodes, and $\gamma $ guarantees that the weights of the leaf nodes are small.

For the t-th iteration, the objective function can be expressed as follows:

$$ {\text{Obj}}^{\left( t \right)} = \sum\limits_{i = 1}^{n} {l\left[ {y_{i} ,\hat{y}_{i}^{{\left( {t - 1} \right)}} + f_{t} \left( {x_{i} } \right)} \right]} + \sum\limits_{k = 1}^{K} {\Omega \left( {f_{k} } \right) + C} $$

(16)

where ${f}_{t}\left({x}_{i}\right)$ is the newly added t-th tree and $C$ is the complexity of the previous t−1 trees, that is, ${\text{C}}={\sum }_{{\text{i}}=1}^{{\text{k}}-1}\Omega \left({{\text{f}}}_{{\text{i}}}\right)$.

With a descending constant term ${\text{l}}\left({{\text{y}}}_{{\text{i}}},{\widehat{{\text{y}}}}_{{\text{i}}}^{\left(t-1\right)}\right)$ and $C$, the second-order Taylor expansion is employed to approximate the original loss function:

$$ \begin{array}{*{20}c} {{\text{Obj}}^{\left( t \right)} \simeq \mathop \sum \limits_{j = 1}^{T} \left[ {G_{j} w_{j} + \frac{1}{2}\left( {H_{j} + \lambda } \right)w_{j}^{2} } \right] + \gamma T} \\ \end{array} $$

(17)

where ${G}_{j}={\sum }_{i\in {I}_{j}}{\partial }_{\widehat{y}\left(t-1\right)}l\left({y}_{{\text{i}}},{y}^{\left(t-1\right)}\right),{H}_{j}={\sum }_{i\in {I}_{j}}{\partial }_{{\widehat{y}}^{\left(t-1\right)}}^{2}l\left({y}_{{\text{i}}},{y}^{\left(t-1\right)}\right)$

When the partial derivative of the objective function ${{\text{Obj}}}^{\left({\text{t}}\right)}$ with respect to ${\upomega }_{{\text{j}}}$ equals 0, the optimal weight value can be obtained as follows:

$$ \begin{array}{*{20}c} {\omega_{j}^{*} = - \frac{{G_{j} }}{{H_{j} + \lambda }}} \\ \end{array} $$

(18)

The optimal objective function value is

$$ \begin{array}{*{20}c} {{\text{Obj}}^{\left( t \right)} = - \frac{1}{2}\mathop \sum \limits_{j = 1}^{T} \frac{{G_{j}^{2} }}{{H_{j} + \lambda }} + \gamma T} \\ \end{array} $$

(19)

Multilayer perceptron artificial neural network

As a robust ML technology, the multilayer perceptron artificial neural network (MLPANN) model has been widely used for prediction in various energy systems. The basic MLPANN consists of three layers: an input layer, a hidden layer and an output layer (Faegh et al. 2021). Given a series of characteristics $X=({x}_{1}, {x}_{2},...)$ and a target $Y$, a multilayer perceptron can learn the relationships between the features and targets for classification or regression purposes.

Light gradient boosting machine

The light gradient boosting machine (LightGBM) is a distributed gradient boosting framework based on a decision tree algorithm. For a given training set, the LightGBM can obtain a strong classifier by combining multiple classification and regression trees (Sun et al. 2022).

Support vector regression

Support vector regression (SVR) is a powerful approach for problems with small sample sizes and high dimensionality (Huang et al. 2022). With the SVR algorithm, a regression plane can be found, and the distance between all data in a set and the plane can be minimized.

Model evaluation

To evaluate different models in terms of energy consumption prediction, the coefficient of determination (R²), MAE, mean absolute percentage error (MAPE), and RMSE can be calculated as follows:

$$ \begin{array}{*{20}c} {R^{2} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \hat{y}_{i} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \frac{1}{n}\mathop \sum \nolimits_{i}^{n} y_{i} } \right)^{2} }}} \\ \end{array} $$

(20)

$$ \begin{array}{*{20}c} {{\text{RMSE}} = \sqrt {\frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {\hat{y}_{i} - y_{i} } \right)^{2} } } \\ \end{array} $$

(21)

$$ \begin{array}{*{20}c} {{\text{MAE}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left| {\hat{y}_{i} - y_{i} } \right|} \\ \end{array} $$

(22)

$$ \begin{array}{*{20}c} {{\text{MAPE}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left| {\frac{{\hat{y}_{i} - y_{i} }}{{y_{i} }}} \right| \times 100\% } \\ \end{array} $$

(23)

where $n$ is the sample size, ${y}_{i}$ is the actual value, ${\widehat{y}}_{i}$ is the predicted value and $i$ is the index.

R² is a classic indicator for evaluating the fitness between actual and predicted values. Both the MAE and RMSE are not sensitive to dimensions, but the RMSE magnifies the gap between larger errors. A MAPE of 0% indicates a perfect model, while a value greater than 100% indicates an inferior model.

SEGA

The genetic algorithm is a heuristic algorithm originating from the Darwin evolutionary system that simulates natural selection as reproduction, crossover, and mutation in DNA. During the evolutionary process, individuals with low adaptability are eliminated. Through continuous selection and the 3 behaviors of DNA, the optimal result can be obtained. However, it has been proven that the canonical GA, which only uses the selection, crossover and mutation operators with crossover and mutation probabilities between (0,1), cannot converge to the optimal value. Thus, an elitism strategy is adopted to select the best individual and copy it to the next generation without a crossover operator (Fig. 2).

Results and discussion

Parameter selection results

If all the existing parameters are used as the inputs of the constructed model, its complexity will be very high, which may result in a long training time and poor performance. Therefore, it is necessary to select inputs first, which is crucial to model performance (Chu et al. 2009). Both the linear and nonlinear selection methods mentioned above were used to select the input parameters (Table 2). The eight parameters with the highest frequencies were selected as the inputs, and they could be classified into three categories: water quantity (IFR), water quality (ETN, IAN, ITP and ETP) and management regulation (DO, ANLR and ORP) parameters. Furthermore, a Kendall correlation analysis was carried out between all parameters and the energy consumption level to evaluate the above results (Fig. 3). The absolute values of the correlation coefficients between the selected parameters and the energy consumption level were mostly at the top, which confirmed that the selected results were reasonable. Among the eight selected parameters, ETP and ANLR had limited correlations with energy consumption (0 < r < 0.2) (Khamis 2008).

Table 2 Parameter selection frequency

Full size table

In terms of water quantity, the IFR exhibited a strong negative correlation with EC. This can be attributed to the fact that most equipment cannot operate under effective energy conditions when the IFR deviates from the designed value (Hanna et al. 2018). In addition, poor management and limited regulation may also account for the excessive energy consumption of small-scale WWTPs (Vaccari et al. 2018).

In terms of water quality, the ETN was closely related to nitrogen removal performance. The conventional nitrogen removal process consists of nitrification and denitrification. To achieve sufficient nitrification, a large amount of energy is consumed for aeration. Once the needed oxygen is supplied, excessive aeration not only wastes energy but also increases the effluent NO₃⁻–N (Liu et al. 2022a). When the MLSS is low, the phosphorus removal process is mainly dependent on the performance of phosphorus-accumulating organisms. The biological phosphorus removal process consists of phosphorus release and phosphorus uptake. Electron acceptors such as oxygen are necessary to uptake phosphorus, so more aeration is needed when the ITP is larger.

In terms of management regulation, a lower OLR implied a longer SRT, which increased the MLSS. Hence, less NH₄⁺-N was associated with a lower ANLR, and DO should theoretically have a negative correlation with the OLR. Due to the excess aeration strategy, DO usually changed more violently than the OLR requirement. With the increases in the OLR and influent COD, excessive aeration was supplied to ensure that the effluent COD satisfied the discharge standard, and DO was positively correlated with the OLR. The excessive DO at the end of the aerobic tank reflowed to the anaerobic tank through external reflow, resulting in a higher ORP. Therefore, the correlation between the energy consumption level and the ORP at the end of the anaerobic tank was weak. However, when the influent COD decreased, aeration control fell behind the actual demand, resulting in excessive energy consumption for aeration.

Performance analysis of the XGB model

An XGB model and three other models were established based on the relationships between energy consumption and the selected parameters. The first 70% of the data were used as the training set, and the other 30% were used as the test set. A grid search was used to optimize the hyperparameters of all models in Table 3. The prediction performance is shown in Table 4 and Fig. 4. Compared with the other methods, XGB is the best model. Although a gap remains between the real and predicted values, their variation trends are almost the same, which verifies that the model is feasible for predicting energy consumption.

Table 3 Hyperparameters of each model

Full size table

Table 4 XGB model evaluation indicators

Full size table

Parameter impact analysis

The F score was used to evaluate the influence of each input on energy consumption (Fig. 5). The effect of each parameter in descending order was IFR, DO, IAN, ITP, ETN, ANLR, ORP, and ETP. It was found that this order was similar to that obtained by the Kendall analysis. If a significant parameter changes obviously, the energy consumption also changes significantly.

Among the uncontrollable parameters, the IFR, IAN and ITP had large influences. This means that once the treatment process and designed flow rate have been selected, the energy consumption levels has almost been determined. Among the controllable parameters, ETN, DO, and ANLR can influence energy consumption.

Energy saving performance

The energy savings achieved under different conditions were evaluated. For each parameter, the average, maximum and minimum values were taken in the established model, while the other seven variables remained unchanged from their average values (Fig. 6). The energy saving efficiency was most sensitive to the influent flow rate, which was consistent with its maximal correlation coefficient. Furthermore, the energy saving efficiency variations were also in accord with the Kendall correlation coefficient. In addition, the amount of energy saved under the maximum (or minimum) values was always similar to the amount of energy wasted under the minimum (or maximum) values.

In practical applications, these parameters often change simultaneously. Considering the synergy among the different parameters, the SEGA was used to optimize the energy consumption level, and XGB served as the map** function between the eight parameters and EC. In the SEGA, eight parameters are the genes of the population, and the EC calculated by XGB is the fitness function. The upper and lower bounds (UBs and LBs, respectively) of the eight input parameters were determined by the discharge standard and the extreme values contained in the historical data. Due to the randomness of the SEGA, several optimization steps were performed (Table 5).

Table 5 Optimized parameters

Full size table

In scenarios 1–3, the UBs and LBs of the eight parameters were set to investigate the resulting changes in the optimal EC. In scenario 1, the UBs of ETN and ETP were the maximum allowed values, which were 15 mg L⁻¹ and 0.5 mg L⁻¹, respectively. The other UBs were the historical maximum, and the LBs were zeros (except that of ORP). The UBs of scenario 2 were the same as those of scenario 1, but its LBs were the historical minima. The UB of ORP in scenario 3 was set to 0, and the other boundary conditions of scenario 3 remained the same as those in scenario 2. The restriction imposed on the search area of ORP did not affect the final energy savings. However, the practical water quality and quantity were unfeasible to regulate, so their UBs and LBs were set as the mean values in scenario 4. The management regulation parameters were optimized by setting their boundary conditions according to the historical extreme values.

The DO and ORP probes are widely used in practical applications. Based on previous research, the DO concentration at the end of the aerobic tank was 1–5 mg L⁻¹ (Qiu et al. 2017), and to achieve biological phosphorus removal, the ORP in an anaerobic environment should be no larger than − 50 mv (Tae et al. 2005; Tang et al. 2012). The optimal parameters obtained from the GA were basically within reasonable ranges. According to the above results, 13–27% of the total energy consumption could be saved (with an average of 22%) by optimizing the management process, and the effluent could meet the discharge standard all the time. When the IFR was maximized, the IAN was close to the average value and the ANLR was high, the minimum energy consumption level could be obtained. Specifically, energy savings could be achieved by setting the management regulation parameters near the optimization results. For ORP, its optimal value could be achieved by adjusting the internal reflow rate. Flexibly opening and closing the air pumps and adjusting the air supply could make DO reach the value given by the GA. For the ANLR, due to the uncontrollable IAN, the difference between IAN and EAN was the determining factor. This showed that energy savings in WWTPs can be achieved by adjusting the operation parameters through the GA, which provides a simple and feasible energy saving strategy.

Conclusions

The energy consumption levels of WWTPs can be predicted and optimized by XGB and the SEGA. In terms of prediction, the XGB model achieved good performance, which was verified by a series of indicators, such as the R², MAE, MAPE, and RMSE metrics. The most important parameter influencing energy consumption is the influent flow rate. Therefore, compared with small-scale WWTPs, large-scale WWTPs with high IFR values need less EC and lower operating costs. In terms of optimization, 13–27% of the total energy consumption (with an average of 22%) could be saved by the optimized management regulation parameters obtained from the SEGA model. This research provides a convenient and reliable strategy for saving energy in WWTPs, which can be used in other treatment processes in practical applications.

References

Bagherzadeh F, Nouri AS, Mehrani M-J, Thennadil S (2021) Prediction of energy consumption and evaluation of affecting factors in a full-scale WWTP using a machine learning approach. Process Saf Environ Prot 154:458–466. https://doi.org/10.1016/j.psep.2021.08.040
Article CAS Google Scholar
Chang J, **g Y, Geng Y, Song X (2021) Promote the low-carbon transformation of municipal sewage treatment industry and facilitate the realization of emission peak and carbon neutrality. China Environ Prot Ind. https://doi.org/10.1016/j.scitotenv.2023.165201
Article Google Scholar
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794
Chu Y, Huang Z, Hahn J (2009) Improving prediction capabilities of complex dynamic models via parameter selection and estimation. Chem Eng Sci 64:4178–4185. https://doi.org/10.1016/j.ces.2009.06.057
Article CAS Google Scholar
Das A, Kumawat PK, Chaturvedi ND (2021) A study to target energy consumption in wastewater treatment plant using machine learning algorithms. In: Computer aided chemical engineering. Elsevier, pp 1511–1516
Ding C, Peng H (2003) Minimum redundancy feature selection from microarray gene expression data. In: Computational systems bioinformatics. CSB2003. Proceedings of the 2003 IEEE bioinformatics conference. CSB2003, pp 523–528
El-Rawy M, Abd-Ellah MK, Fathi H, Ahmed AKA (2021) Forecasting effluent and performance of wastewater treatment plant using different machine learning techniques. J Water Process Eng 44:102380. https://doi.org/10.1016/j.jwpe.2021.102380
Article Google Scholar
Faegh M, Behnam P, Shafii MB, Khiadani M (2021) Development of artificial neural networks for performance prediction of a heat pump assisted humidification-dehumidification desalination system. Desalination 508:115052. https://doi.org/10.1016/j.desal.2021.115052
Article CAS Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360. https://doi.org/10.1198/016214501753382273
Article MathSciNet Google Scholar
Hanna SM, Thompson MJ, Dahab MF et al (2018) Benchmarking the energy intensity of small water resource recovery facilities. Water Environ Res 90:738–747. https://doi.org/10.2175/106143017X15131012153176
Article CAS PubMed Google Scholar
Huang H, Wei X, Zhou Y (2022) An overview on twin support vector regression. Neurocomputing 490:80–92. https://doi.org/10.1016/j.neucom.2021.10.125
Article Google Scholar
Ingdal M, Johnsen R, Harrington DA (2019) The Akaike information criterion in weighted regression of immittance data. Electrochim Acta 317:648–653. https://doi.org/10.1016/j.electacta.2019.06.030
Article CAS Google Scholar
James G, Witten D, Hastie T, Tibshirani R (2021) An introduction to statistical learning: with applications in R. Springer, New York
Book Google Scholar
Khamis H (2008) Measures of association: how to choose? J Diagn Med Sonogr 24:155–162. https://doi.org/10.1177/8756479308317006
Article Google Scholar
Liu J, Dong B, Qian Z et al (2022a) Optimizing aeration pattern to improve nitrogen treatment performance of ditch wetlands in polder areas around Chaohu Lake. China Ecol Eng 183:106737. https://doi.org/10.1016/j.ecoleng.2022.106737
Article Google Scholar
Liu X, Yang J, Wang L, Wu J (2022b) Bayesian information criterion based data-driven state of charge estimation for lithium-ion battery. J Energy Storage 55:105669. https://doi.org/10.1016/j.est.2022.105669
Article Google Scholar
Ly QV, Truong VH, Ji B et al (2022) Exploring potential machine learning application based on big data for prediction of wastewater quality from different full-scale wastewater treatment plants. Sci Total Environ 832:154930. https://doi.org/10.1016/j.scitotenv.2022.154930
Article ADS CAS PubMed Google Scholar
Nourani V, Asghari P, Sharghi E (2021) Artificial intelligence based ensemble modeling of wastewater treatment plant using jittered data. J Clean Prod 291:125772. https://doi.org/10.1016/j.jclepro.2020.125772
Article CAS Google Scholar
Picos-Benítez AR, Martínez-Vargas BL, Duron-Torres SM et al (2020) The use of artificial intelligence models in the prediction of optimum operational conditions for the treatment of dye wastewaters with similar structural characteristics. Process Saf Environ Prot 143:36–44. https://doi.org/10.1016/j.psep.2020.06.020
Article CAS Google Scholar
Qiu W, Wu K, Jiang J et al (2017) Optimization of the A2/O technological parameters based on GA-ANN model. J Harbin Inst Technol 49:117–121. https://doi.org/10.11918/j.issn.0367-6234.201607057
Article Google Scholar
Sun J, Li J, Fujita H (2022) Multi-class imbalanced enterprise credit evaluation based on asymmetric bagging combined with light gradient boosting machine. Appl Soft Comput 130:109637. https://doi.org/10.1016/j.asoc.2022.109637
Article Google Scholar
Tae KH, Kim G-S, Shin S-W et al (2005) Application of ORP and pH as controlling factors in sequencing batch reactor. KSCE J Civ Eng 9:73–79. https://doi.org/10.1007/BF02829061
Article Google Scholar
Tang Y, Wu C, Yang H et al (2012) Regulation and control of oxidation reduction potential (ORP) of the activated sludge of modified Carrousel oxidation ditch and its influence on phosphorus absorption/release. Ind Water Treat 32:19–22. https://doi.org/10.3969/j.issn.1005-829X.2012.03.005
Article Google Scholar
Vaccari M, Foladori P, Nembrini S, Vitali F (2018) Benchmarking of energy consumption in municipal wastewater treatment plants—a survey of over 200 plants in Italy. Water Sci Technol 77:2242–2252. https://doi.org/10.2166/wst.2018.035
Article CAS PubMed Google Scholar
Wan X, Li X, Wang X et al (2022) Water quality prediction model using Gaussian process regression based on deep learning for carbon neutrality in papermaking wastewater treatment system. Environ Res 211:112942. https://doi.org/10.1016/j.envres.2022.112942
Article CAS PubMed Google Scholar
Wang J, Zhao X, Guo Z et al (2022) A full-view management method based on artificial neural networks for energy and material-savings in wastewater treatment plants. Environ Res 211:113054. https://doi.org/10.1016/j.envres.2022.113054
Article CAS PubMed Google Scholar
Yang G, Wan L, Wang H, Yu D (2021) Energy consumption analysis of sewage treatment plant, measures and application of energy saving and consumption reduction. Resour Econ Environ Prot. https://doi.org/10.16317/j.cnki.12-1377/x.2021.10.004
Article Google Scholar
Yang Q, Wang Y, Cao X et al (2022) Research progress of carbon neutrality operation technology in Sewage treatment. J Bei**g Univ Technol 48:292–305. https://doi.org/10.11936/bjutxb2021090022
Article CAS Google Scholar
Yoshida H, Mønster J, Scheutz C (2014) Plant-integrated measurement of greenhouse gas emissions from a municipal wastewater treatment plant. Water Res 61:108–118. https://doi.org/10.1016/j.watres.2014.05.014
Article CAS PubMed Google Scholar
Żyłka R, Dąbrowski W, Malinowski P, Karolinczak B (2020) Modeling of electric energy consumption during dairy wastewater treatment plant operation. Energies 13:3769. https://doi.org/10.3390/en13153769
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Architecture, Civil and Transportation Engineering, Bei**g University of Technology, Bei**g, 100124, China
Zijian Wang & Hao Wang
School of Water Conservancy and Transportation, Zhengzhou University, Zhengzhou, 450001, China
Zijian Wang, Xuwen Zhou & Zhaoxu Peng
School of Civil and Environmental Engineering, Harbin Institute of Technology, Shenzhen, 518055, China
Xuwen Zhou
Central Plains Environmental Protection Co., Ltd, Zhengzhou, 450001, China
Zehan Huang
School of Ecological and Environment, Zhengzhou University, Zhengzhou, 450001, China
Jiantao Ji
CSD Water Service Co., Ltd, Bei**g, 100124, China
Kun Jiang

Authors

Zijian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xuwen Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zehan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jiantao Ji
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoxu Peng
View author publications
You can also search for this author in PubMed Google Scholar
Kun Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhaoxu Peng.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability

The authors do not have permission to share data.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, Z., Zhou, X., Wang, H. et al. XGB-SEGA coupled energy saving method for wastewater treatment plants. Appl Water Sci 14, 29 (2024). https://doi.org/10.1007/s13201-023-02081-3

Download citation

Received: 19 September 2023
Accepted: 13 December 2023
Published: 23 January 2024
DOI: https://doi.org/10.1007/s13201-023-02081-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

XGB-SEGA coupled energy saving method for wastewater treatment plants

Abstract

Similar content being viewed by others

Intelligent automation and IT for the optimization of renewable energy and wastewater treatment processes

Electrical energy recovery from wastewater: prediction with machine learning algorithms

Maximised bioethanol extraction from bamboo biomass through alkali pretreatment and enzymatic saccharification by application of ANN-NSGA-II-based optimisation method

Introduction

Materials and methodology

Background of the target WWTP

Feature selection

Ordinary least squares

Least absolute shrinkage and selection operator

Smoothly clipped absolute deviation

C p criterion

Akaike information criterion

Bayesian information criterion

Minimum redundancy–maximum relevance

Model construction

XGB

Multilayer perceptron artificial neural network

Light gradient boosting machine

Support vector regression

Model evaluation

SEGA

Results and discussion

Parameter selection results

Performance analysis of the XGB model

Parameter impact analysis

Energy saving performance

Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Data availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

C _p criterion