Introduction

One of the most important components of the aquatic life is certainly Dissolved oxygen concentration (DO). The principal sources of in-stream DO are: (1) diffusion from the atmosphere at the stream surface exchange, (2) mixing of the stream water at riffles, and (3) photosynthesis from in-stream primary production (O’Driscoll et al. 2016). DO is measured in milligrams per liter (mg/l). In the river ecosystems, DO is produced and consumed continuously and it is necessary to the fauna, flora and aquatic organisms. A reduction of level of DO may cause long-term adverse effects in the aquatic environment (Gonçalves and Costa 2013), and a deficiency of DO is a sign of an unhealthy river (Mondal et al. 2016). It is also reported that DO is an important factor influencing the dynamics of phytoplankton and zooplankton populations and a model has been recently proposed and tested describing the role of DO on the plankton dynamics (Dhar and Baghel 2016). Misra and Chaturvedi (2016) reported that DO concentration is the most important factor affecting subsequent survival of fish. Since then, some models using different modelling approaches have been proposed for estimating DO in rivers, streams and lakes ecosystems.

Abdul-Aziz et al. (2007a, b) developed an empirical model to adjust discrete DO measurements to a common time-reference value using an extended stochastic harmonic analysis (ESHA) algorithm. The model was calibrated and validated for different stream sites across Minnesota, USA, incorporating effects of different ecoregions and variable drainage areas. In order to validate the model, the authors have used with independent data for other sites in Minnesota. Considering DO as important water quality indicator, Costa and Gonçalves (2011) compared two approaches: the linear and the state-space models, the two models have been associated to the clustering technique. The authors have tried to identify and classify homogeneous groups of water quality based on similarities in the temporal dynamics of the DO concentration. Subsequently, they have compared the two models for predicting DO using data from River Ave basin, Portugal. The authors have obtained root mean square errors (RMSE) of 0.961 and 0.846 for the linear models and the state space model, respectively. Prasad et al. (2011) used MLR approach to develop a three dimensional model for prediction of spatially explicit DO levels in Chesapeake Bay, USA, by accounting for long-term variability of nutrient concentrations: total dissolved nitrogen (TDN), total dissolved phosphorus (TDP), water temperature (TE) and salinity (SA) across the Bay. The model has been applied at monthly time step and the step-wise regression approach was used to select a starting point for the MLR models relating DO concentrations to monthly water TE, SA, TDN and TDP levels. Akkoyunlu et al. (2011) examined the depth-dependent estimation of a lake’s DO using two ANN methods: (1) the RBFNN and the MLPNN, and (2) the multiple linear regression (MLR). The comparison results revealed that the ANN methods were noticeably superior to those of MLR in modelling the DO.

Ay and Kisi (2012) developed and compared two different artificial neural network (ANN) techniques, the multi-layer perceptron artificial neural network (MLPNN) and the radial basis function neural network (RBFNN), for modelling DO concentration. The ANN models were developed using experimental data collected from the upstream and downstream USGS stations on Foundation Creek, Colorado, USA. Antanasijević et al. (2013) developed and compared three types of ANN namely, generalized regression neural network (GRNN), backpropagation neural network (BPNN) and recurrent neural network (RNN), for the prediction of DO concentration in the Danube River, North Serbia. An innovative approach has been proposed by Areerachakul et al. (2013) that combine unsupervised and supervised artificial neural networks (ANN) based approaches. Using thirteen (13) water quality variables collected with a time step of 1 month, the authors have applied in the first part, the standard multilayer perceptron neural network (MLPNN). In the second part, they have applied the MLPNN with a priori unsupervised clustering methods: the K-mean and fuzzy c-mean algorithms, the combined model is called k-MLPNN, and applied to predict the DO in k clusters based on the 13 water quality. As a result of the investigation, the authors have demonstrated that k-MLPNN had higher predictive capability than a standard MLPNN model with coefficient of correlation (CC) of 0.83 and 0.62 for the k-MLPNN and MLPNN, respectively.

A model called real-value genetic algorithm support vector regression (RGA-SVR) has been proposed by Liu et al. (2013). The authors have applied the model for predicting water DO in in aquaculture river crab pond in China, and demonstrated that RGA-SVR provided best results in comparison to the standard support vector regression (SVR) and MLPNN. The root mean square error (RMSE) obtained in the testing phase was 0.0195, 0.051 and 0.283 for RGA-SVR, SVR and MLPNN, respectively. Wavelet neural network (WNN), that uses morlet wavelet as the wavelet transfer function in the hidden layer has been proposed by Xu and Liu (2013). The authors have applied the model for predicting DO in the Intensive freshwater pearl breeding ponds in Duchang county, Jiangxi province, China. Compared with prediction results achieved by the MLPNN and the Elman neural network (ELM), the low mean absolute percentage error (MAPE) was obtained with WNN. Kisi et al. (2013) investigated the accuracy of three artificial intelligence techniques, namely MLPNN, ANFIS and gene expression programming (GEP) in modelling daily DO in South Platte River at Englewood, Colorado, USA. As a conclusion of the investigation, the authors have demonstrated that GEP model performed better than the MLPNN and ANFIS models in modelling DO concentration. Liu et al. (2014) proposed at the first time a particular model that combining both wavelet analysis (WA) and least squares support vector regression (LSSVR) with an optimal improved Cauchy particle swarm optimization (CPSO) algorithm. The proposed hybrid model called WA-CPSO-LSSVR has been applied to predict DO in river crab culture ponds, at the Yixing base of intelligent aquaculture management systems in Jiangsu pro-vince, China. For comparison, the authors have applied for the same data set the standard LSSVR, and the flexible structure radial basis function neural network (FS-RBFNN). From the results obtained it can be concluded that the estimation results were significantly different and the WA-CPSO-LSSVR model provided good results in comparison to the other two. The CC obtained in the testing phase was 0.89, 0.92 and 0.96 for LSSVR, FS-RBFNN and WA-CPSO-LSSVR, respectively.

Heddam (2014a) applied generalized regression neural network (GRNN) based model for modelling hourly DO, at Klamath River, Oregon, USA. Evrendilek and Karakaya (2014) investigated the effects of discrete wavelet transforms (DWT) with the orthogonal Symmlet and the semi orthogonal Chui-Wang B-spline on predictive power of multiple non-linear regression models (MNLR) models for diel, daytime (diurnal) and nighttime (nocturnal) DO dynamics. Using three artificial intelligence-based models, Emamgholizadeh et al. (2014) have conducted an investigation for modelling DO in Karoon River, Iran. The authors have selected nine (9) water quality variables with a time step of 1 month. The investigated models were: radial basis function neural network (RBFNN), multilayer perceptron neural network (MLPNN) and adaptive neuro-fuzzy inference system (ANFIS) models. From the results reported by the authors MLPNN, outperforms ANFIS and RBFNN for DO prediction, producing a CC of 0.86, while the other two models (ANFIS and RBFNN) have provided a CC equal to 0.83 and 0.75, respectively. Abdul-Aziz and Ishtiaq (2014) conducted an investigation for predicting hourly DO time-series from different streams representing four distinct US Environmental Protection Agency (US EPA) Level III Ecoregions of Minnesota, USA. The authors have developed a scaling-based robust, empirical model for simulating the diurnal cycle of stream DO from a single reference observation, and have obtained a high CC rather than 0.94. Antanasijević et al. (2014) applied GRNN model with the Monte Carlo Simulation (MCS) technique for modelling DO, across multiple sites; located on the Danube River, North Serbia.

Heddam (2014b) developed and compared two adaptive neuro-fuzzy inference systems (ANFIS) for modeling hourly DO, at Klamath River, Oregon, USA. In another study, Heddam (2014c) applied an artificial intelligence (AI) technique model called dynamic evolving neural-fuzzy inference system (DENFIS) based on an evolving clustering method (ECM), for modelling hourly DO in Klamath River, Oregon, USA. In another study, Evrendilek and Karakaya (2015) used median and linear regression models of saturated dissolved oxygen (DOsat) after denoising by using discrete wavelet transform (DWT) with Chui-Wang B-spline and Coiflet wavelets decomposition. Alizadeh and Kavianpour (2015) compared two artificial neural networks models: standard MLPNN and wavelet-neural network (WNN) for predicting DO concentration, using a variety of water quality variables as input. Using data collected from Hilo Bay on the east side of the Big Island, the authors have developed the models at daily and hourly time step. For the model at hourly time step, they have reported a high CC rather than 0.98 and 0.97 in the validation and testing phase, respectively using the WNN model, while for the MLPNN the CC were 0.89 and 0.94 in the validation and testing phase, respectively. For the model at daily time step the CC for the WNN model was slightly less than in hourly time step but still well above the MLPNN model. In another study, Nemati et al. (2015) have compared three artificial intelligence modelling techniques namely, ANFIS, MLPNN and MLR for predicting DO in Tai Po River, New Territories, Hong Kong. In order to investigate the capability of these techniques for predicting DO, they have used data at time step of 1 month and eight (8) water quality variables were selected as input variables according to their correlations with DO. According to the results reported, MLPNN model outperforms ANFIS and MLR. The CC was 0.798, 0.645 and 0.681, for MLPNN, ANFIS and MLR, respectively. Recently, An et al. (2015) used the nonlinear grey Bernoulli model (NGBM (1, 1)) to simulate and forecasting DO in the Guanting reservoir (inlet and outlet), located at the upper reaches of the Yongding River in the northwest of Bei**g, China. Bayram et al. (2015) investigated the applicability of teaching–learning based optimization (TLBO) algorithm in modeling stream DO in turkey. The authors have used four stream water quality indicators, namely, water temperature (TE), pH, electrical conductivity (EC), and hardness (WH). The TLBO method is compared with those of the artificial bee colony algorithm (ABC) and conventional regression analysis methods (CRA). These methods are applied to four different regression forms: quadratic, exponential, linear, and power. Heddam (2016a) applied optimally pruned extreme learning machine (OP-ELM) in forecasting DO several hours in advance, at Klamath River, Oregon, USA.

Artificial intelligence (AI) techniques have been frequently applied in environmental modelling. Some of these applications include, among others, the following: prediction of reservoir permeability from porosity measurements (Handhal 2016); predictive modeling of discharge in compound open channel (Parsaie et al. 2015); automatic inversion tool for geoelectrical resistivity (Raj et al. 2015); forecasting monthly groundwater level (Kasiviswanathan et al. 2016); predicting the dispersion coefficient (D) in a river ecosystem (Antonopoulos et al. 2015); modelling the permeability losses in permeable reactive barriers (Santisukkasaem et al. 2015); estimating the reference evapotranspiration (ET0) (Adamala et al. 2015); calculating the dynamic coefficient in porous media (Das et al. 2015); predicting Indian monsoon rainfall (Azad et al. 2015), and modeling of arsenic (III) removal (Mandal et al. 2015). Although RBFNN has been applied for modelling DO concentration, to the best of our knowledge, there have been no studies done on the application of RBFNN for forecasting DO in rivers; hence the present study aims to investigate the capabilities of the RBFNN in comparison to the standard MLPNN for simultaneous modelling and forecasting of hourly DO concentration.

Methodology

Two models of artificial neural networks (ANN) are developed and compared in this study: the Multilayer Perceptron Neural Network (MLPNN) and the Radial Basis Function Neural Network (RBFNN). A brief description of these models is set forth hereafter.

Multilayer perceptron neural network (MLPNN)

Multilayer perceptron neural network (MLPNN) (Rumelhart et al.1986) is the most important type of ANN and its structure can be represented as in Fig. 1. The MLPNN is a feedforward network and has three layers: the input layer, the hidden layer and the output layer. Each layer contains a number of neurons. The processing ability of the network is stored in the inter unit connection strengths (or weights) that are obtained through a process of adaptation to a set of training pattern (Haykin 1999). The number of neurons in the input layer corresponds to the number of input variables; the input layer only collects information. The hidden layer is the important layer in the MLPNN model and contains several neurons, and each neuron in this layer is connected to the every neuron in the next and previous layer. Each neuron in the hidden layer calculates the sum of the weighted input and adds a bias value. The sum value obtained on this application is passed through a non-linear function known as the transfer function, which is usually a sigmoid function, to the output layer. The third layer is the output layer. There is only one neuron in this layer: the desired DO. The MLPNN are capable of approximating any function with a finite number of discontinuities (Hornik et al. 1989) and considered as a universal approximators (Hornik et al. 1989; Hornik 1991).

Fig. 1
figure 1

Architecture of multilayer perceptron neural network (MLPNN)

Let us denote k as the number of input variables, m as the number of neurons in the hidden layer, the mathematical structure of the MLPNN from the input to the output can be formulated as follow:

$$\mathop {\rm A}\nolimits_{j} = \mathop {\rm B}\nolimits_{1} + \sum\limits_{i = 1}^{k} {\mathop w\nolimits_{\text{ij}} \times \mathop x\nolimits_{i} } ,$$
(1)

where A j is the weighted sum of the j hidden neuron, k is the total number of inputs, w ij denotes the weight characterising the connection between the n th input to the m th hidden neuron, and B 1 is the bias term of each hidden neuron. The output of the m th hidden neuron is given by

$$\mathop \varUpsilon \nolimits_{j} = f\left( {\mathop {\rm A}\nolimits_{j} } \right).$$
(2)

The activation function f adopted for the present study was the sigmoid, represented by Eq. (3).

$$f\left( {\rm A} \right) = \frac{1}{{1 + \mathop e\nolimits^{ - A} }}.$$
(3)

The neural network output is then given by

$$\mathop {\rm O}\nolimits_{k} = \mathop {\rm B}\nolimits_{2} + \sum\limits_{j = 1}^{m} {\mathop w\nolimits_{jk} \times \mathop \varUpsilon \nolimits_{j} } ,$$
(4)

where w jk denotes the weight characterising the connection between the m th hidden neuron to the p th output neuron, m the total number of hidden neurons) and B 2 is the bias term. The linear activation function is most commonly applied to the output layer.

MLPNN is the most widely used neural network model, and has been applied to solve many difficult problems in environmental sciences. Some different applications are as follows. Prediction of uniaxial compressive strength of travertine rocks (Barzegar et al. 2016); temperature variations and generate missing temperature data in Iran (Salami and Ehteshami 2016); river flow forecasting (Kasiviswanathan and Sudheer 2016); prediction of water quality index in groundwater systems (Sakizadeh 2016); runoff simulation (Javan et al. 2015); runoff and sediment yield modeling (Sharma et al. 2015); modeling Secchi disk depth (SD) in river (Heddam 2016b); and predicting phycocyanin (PC) pigment concentration in river (Heddam 2016c).

Radial basis function neural network (RBFNN)

Proposed by Broomhead and Lowe (1988), radial-basis functions neural networks (RBFNN) is a feed-forward network and have three layers: the input layer, the hidden layer and output layer. The RBFNN uses a linear transfer function for the output neurons and a nonlinear Gaussian function for the hidden neurons (Moody and Darken 1989). The RBFNN neural network model has been proven to be a universal function approximator (Park and Sandberg 1991). To the mathematical point of view, the RBFNN structure shown in Fig. 2 can be presented as follow (Lin and Wu 2011):

$$\mathop \varphi \nolimits_{i} \left( x \right) = \left\| {x - \mathop \mu \nolimits_{i} } \right\|\quad i = 1,2, \ldots ,N,$$
(5)

φ (x) is the output of the j th hidden neuron, \(\left\| {\cdot} \right\|\) denotes the Euclidean distance, x is the p-dimensional input vector, µ i is the center (vector) of the ith hidden neuron, and φ is the activation function (Lin and Wu 2011). The RBFNN Gaussian function can be written as:

$$\mathop \varphi \nolimits_{i} \left( x \right) = \exp \left( { - \frac{{\mathop {\left\| {x - \mathop \mu \nolimits_{i} } \right\|}\nolimits^{2} }}{{2\mathop \sigma \nolimits_{i}^{2} }}} \right)\quad {\text{i}} = 1,{ 2},N,$$
(6)

where σ i is the widths (or spread) of the hidden neuron.

Fig. 2
figure 2

Architecture of radial basis function neural network (RBFNN)

The output of the RBFNN model can be calculated as follow

$$\mathop \varUpsilon \nolimits_{i} = \sum\limits_{j = 1}^{N} {\mathop w\nolimits_{ij} } \, \mathop \varphi \nolimits_{\text{j}} \left( x \right) \, + \, \mathop B\nolimits_{2}$$
(7)

w ij represents a weighted connections between the radial basis function neuron and output neuron; and N = number of hidden-layer neurons. The constant term B 2 in Eq. (7) represents a bias. The output of the network is a linear combination of the basis functions computed by the hidden layer nodes and the supervised gradient-descent-based method is used for the network training (Poggio and Girosi, 1990a, b).

In previous works, there have been reported some important applications of RBFNN in different areas of environmental science. Some of these applications are as follows. Modelling coagulant dosage in water treatment plant (Heddam et al. 2011); modelling daily ET0 (Ladlani et al. 2012); sequestration of soil organic carbon (SOC) in the agricultural surface soils and bottom sediments (Pal et al. 2016); spatial variability of soil organic carbon (Bhunia et al. 2016); predicting the side weir discharge coefficient (Parsaie 2016); simulation of nitrate contamination in groundwater (Ehteshami et al. 2016); groundwater salinity prediction (Barzegar and Moghaddam 2016), and predicting the longitudinal dispersion coefficient in rivers (Parsaie and Haghiabi 2015).

Description of study area

The historical hourly dissolved oxygen concentration (DO) and the four water quality variables data from (1 Jun 2014) to (31 May 2015) were used in this study and are available at the United States Geological Survey (USGS) website, http://or.water.usgs.gov/cgi-bin/grapher/table_setup.pl?site_id.Two stations are chosen: (USGS ID: 421015121471800)at Lost River Diversion Channel nr Klamath River, Oregon USA (Latitude 42°10′15″, Longitude 121°47′18″ NAD83), and (USGS ID: 421401121480900) at Upper Klamath Lake at Link River Dam, Oregon USA (Latitude 42°14′01″, Longitude 121°48′09″ NAD83). Figure 3 shows the locations of the stations in study area. For the two stations the data set is divided into three sub-data sets: (i) a training set (60 %), (ii) a validation set (20 %) and (iii) a test set (20 %).

Fig. 3
figure 3

[adopted from Sullivan et al. (2012, 2013a, b)]

Map showing the study area

Ranges of water quality data

The statistical parameters of the DO and water quality variables data such as the mean, maximum, minimum, standard deviation, and the coefficient of variation values (i.e., X mean , X max , X min , Sx, and C v respectively) are given in Table 2. Because the five variables described above had different dimensions, and there was major difference among values, it was considered to be necessary to standardize the primary data in order to enhance the training speed and the precision of the models. Input data were entered into the models after normalization. For this purpose, Eq. (8) was utilized:

$$\mathop {\mathop x\nolimits_{ni,k} }\nolimits_{{}} = \frac{{\mathop x\nolimits_{i,k} - \mathop m\nolimits_{k} }}{{\mathop {Sd}\nolimits_{K} }},$$
(8)

x ni, k : is the normalized value of the variable k (input or output) for each sample i,. x i , k the original value of the variable k (input or output). m k and S dk are the mean value and standard deviation of the variable k (input or output). All the input and output variables were normalized to have zero mean and unit variance (Heddam et al. 2012, 2016; Heddam 2014d, 2016b, c).

From the Table 1, it can be seen that pH and SC have a direct relationship with the DO, with a CC equal to 0.16 and 0.49, respectively, while SD and TE water quality variable have an inverse relationship with the DO with a CC equal to −0.26 and −0.59, respectively, for the USGS 421015121471800 station. Always, in Table 1 for the USGS 421401121480900 station, it can be seen that pH and SD have a direct relationship with the DO, with a CC equal to 0.09 and 0.25, respectively, while SC and TE water quality variable have an inverse relationship with the DO, with a CC equal to −0.25 and −0.63, respectively. According to Table 2, for the USGS 421015121471800 station, DO concentrations ranged over three orders of magnitude, with minimum and maximum values of 0.1 and nearly 30 mg/L (30.50 mg/L). The mean of all observations was 8.20 mg/L. At the USGS 421401121480900, DO concentrations ranged over three orders of magnitude, with minimum and maximum values of 1.90 and nearly 16 mg/L (15.80 mg/L). The mean of all observations was 9.58 mg/L. According to Table 2, temperature inversely related to the concentration of DO in water; as temperature increases, DO decrease. Conversely, a temperature decline causes the oxygen concentration to increase.

Table 1 Pearson correlation coefficients between and among physical water-quality parameters, and dissolved oxygen concentration
Table 2 Hourly statistical parameters of data set

Performance indices

Any developed models must be evaluated regarding their performances. In the present study we computed three performances indices in order to validate and compare the models developed. The three indices are calculated according to Legates and McCabe (1999) and Moriasi et al. (2007): the coefficient of correlation (CC), the root mean squared error (RMSE) and the mean absolute error (MAE).

$$CC = \frac{{\frac{1}{\rm N}\sum\limits_{{}}^{{}} {\left( {\mathop {\rm O}\nolimits_{i} - \mathop {\rm O}\nolimits_{m} } \right)\left( {\mathop {\rm P}\nolimits_{i} - \mathop {\rm P}\nolimits_{m} } \right)} }}{{\sqrt {\frac{1}{\rm N}\sum\limits_{i = 1}^{n} {\mathop {\left( {\mathop {\rm O}\nolimits_{i} - \mathop {\rm O}\nolimits_{m} } \right)}\nolimits^{2} } } \sqrt {\frac{1}{\rm N}\sum\limits_{i = 1}^{n} {\mathop {\left( {\mathop {\rm P}\nolimits_{i} - \mathop {\rm P}\nolimits_{m} } \right)}\nolimits^{2} } } }},$$
(9)
$$RMSE = \sqrt {\frac{1}{\rm N}\sum\limits_{i = 1}^{\rm N} {\mathop {\left( {\mathop {\rm O}\nolimits_{i} - \mathop {\rm P}\nolimits_{i} } \right)}\nolimits^{2} } } ,$$
(10)
$$MAE = \frac{1}{\rm N}\sum\limits_{i = 1}^{\rm N} {\left| {\mathop {\rm O}\nolimits_{i} - \mathop {\rm P}\nolimits_{i} } \right|} ,$$
(11)

where N is the number of data points, O i is some measured value and P i is the corresponding model prediction. O m and P m are the average values of O i and P i .

Results and discussion

As stated above, the present study has two majors objectives: one is modeling DO using water quality variables as predictors and the second is the forecasting of DO at different hours in advance. In this section we present the results obtained separately.

Modelling DO concentration

Modelling DO in the USGS 421015121471800 station

Five models were developed and compared. The five models are the four-factor input vector model (TE, pH, SC and SD), called M5; the three-factor input vector model (TE, pH and SC), called M4; the three-factor input vector model (TE, pH and SD), called M3; the two-factor input vector model (TE and SC), called M2 and the two-factor input vector model (TE and pH), called M1, respectively (Table 3). A comparison of the performance of the RBFNN model with that of the MLPNN model was carried out to study their efficacy in modelling DO concentration. The performances of the five (M1 to M5) developed models are measured on the test set according to the three performances indices and the results are reported in Table 4.

Table 3 Combinations of input variables considered in develo** models
Table 4 Performances of the RBFNN and MLPNN models in different phases for USGS 421015121471800 station

In all five RBFNN models developed, the key parameter called spread (σ) is the important parameter that must be optimized during the training process. The spread parameter values providing the best testing performance of the RBNN were equal to 1. As is shown in Table 4, for the RBFNN model, it is observed that the MAE, RMSE and CC values vary in the range of 0.440–1.771 mg/L, 0.747–2.359 mg/L and 0.837–0.985, respectively, in the training phase. In addition, in the validation phase, the values of MAE, RMSE and CC, ranged from 0.521 to 1.759, 0.855 to 2.381 mg/L, and 0.838 to 0.981, respectively. Finally, in the testing phase, the values of MAE, RMSE and CC, ranged from 0.518 to 1.770, 0.884 to 2.388 mg/L, and 0.827 to 0.978, respectively. It may be seen from Table 4, the CC values for all the six models are reasonably good, being smallest (0.827) for M2 model and greatest (0.985) for M5 model. The values of other model performances such as RMSE, and MAE indicate that the forecast performance of the RBFNN model is very good, except the model M2 that is relatively acceptable, and the RBFNN (M5) model performed better than the other models in the training, validation, and testing phases. It is important to state that the investigation showed that the worst results were achieved using the model M2 that has the TE and SC as inputs. The Scatterplots and comparison of observed and calculated values of DO in the Training, Validation and Testing phase, respectively, are shown in Fig. 4 for the RBFNN M5 model, in the USGS 421015121471800 Station.

Fig. 4
figure 4

Results with RBFNN model for USGS 421015121471800 station. Scatterplots and comparison of observed and calculated series of DO in the: a training, b validation and c testing phase, respectively

A multilayer perceptron neural network (MLPNN) as shown in Fig. 2 has been developed for modelling DO using the same input variables reported above. The proposed MLPNN has three layers: an input layer with two to four input variables under the (M1 to M5), a hidden layer with a nonlinear sigmoid transfer function and a linear output layer with only one neuron that correspond to the DO. The weights and biases are the unique parameters that must be optimized in the MLPNN model using a training algorithm. The optimum number of neurons in the hidden layer is determined by trial and error. We have varied the number of neurons from one to twenty and we found that a model with thirteen neurons at the hidden layer corresponds to the best model. The parameters of MLPNN have been optimized using the error back propagation algorithm which is an iterative Learning algorithm. As seen from Table 4, the five MLPNN models have shown significant variations based on the three performance criteria. The lowest value of the RMSE in the testing phase is 1.013 (in MLPNN M5) and the highest value of the CC is 0.971 (in MLPNN M5). In addition, the lowest value of MAE is 0.634 also (in MLPNN M5). From the results of training, validation, and testing all the five models developed in this study are evaluated all together, and the M1, M3, M4, and M5 models are conspicuous. Among these, the M4 and M5 models have quite low MAE and high CC, and the M5 model is very successful on validation and testing phase. All these five models were examined comparing their ability on modelling hourly DO concentration. During training, the MLPNN (M5) performs slightly better than the others. Also, in the validation and testing phases, the MLPNN (M5) outperforms all other models in terms of various performance criteria. The statistical indicators in the Table 4 indicate that the calculated DO using RBFNN are more accurate compared to MLPNN models (relatively low values of MAE and RMSE, and high values of CC). In conclusion, the M5 model is the best developed model for modelling DO concentration, and RBFNN performs better than MLPNN model. The Scatterplots and comparison of observed and calculated values of DO in the Training, Validation and Testing phases are shown in Fig. 5 for the MLPNN M5 model, in the USGS 421015121471800 Station.

Fig. 5
figure 5

Results with MLPNN model for USGS 421015121471800 station. Scatterplots and comparison of observed and calculated series of DO in the: a training, b validation and c testing phase, respectively

Modelling DO in the USGS 421401121480900 station

The accuracy and performance of RBFNN model for modelling DO in the USGS 421401121480900 station are evaluated and compared using RMSE, MAE, and CC statistical criterion. Table 5 shows all these criteria in the training, validation and testing phases. The table shows that, all models (M1 to M5) have a small RMSE value, particularly M3, M4 and M5 models. According to Table 5 for all three RBFNN models (M3, M4 and M5), the performance in the training phase was slightly better than the performance for the validation and testing phases, with only few improvements, with the exception of the model M5 where the difference was statistically significant. Nevertheless, the M5 model must be considered as the best model developed. However, in accordance of the results obtained in the previous station the M2 model that used SC and TE as input, performed much poorer than that the others in terms of RMSE, MAE, and CC. As seen from Table 5, the five RBFNN models have shown significant variations based on the three performance criteria. In the training phase, the lowest value of the RMSE of the models is 0.287 mg/L (in RBFNN M5) and the highest value of the CC is 0.991 (in RBFNN M5). In addition, the lowest value of MAE is 0.184 mg/L also (in RBFNN M5). Table 5 indicates that the RBFNN (M5) has the smallest MAE (0.281 mg/L), RMSE (0.461 mg/L), and the highest CC (0.978) in the validation phase; and in the testing phase the RBFNN (M5) has the smallest MAE (0.312 mg/L), RMSE (0.644 mg/L) and the highest CC (0.955). The Scatterplots and comparison of observed and calculated values of DO in the Training, Validation and Testing phase, respectively, are shown in Fig. 6 for the RBFNN M5 model, in the USGS 421401121480900 station.

Table 5 Performances of the RBFNN and MLPNN models in different phases for USGS 421401121480900 station
Fig. 6
figure 6

Results with RBFNN model for USGS 421401121480900 station. Scatterplots and comparison of observed and calculated series of DO in the: a training, b validation and c testing phase, respectively

As seen from Table 5, the five MLPNN models have shown significant variations based on the three performance criteria. The lowest value of the RMSE of forecasting models is 0.453 (mg/L) (in MLPNN M5) and the highest value of the CC is 0.976 (in MLPNN M5). In addition, the lowest value of MAE is 0.300 (mg/L) also (in MLPNN M5). From the results of training, validation, and testing all the five models developed in this study are evaluated all together, and the M1, M3, M4, and M5 models are conspicuous. Among these, the M4 and M5 models have quite low MAE and high CC, and the M5 model is very successful on testing phase. All these models were examined comparing their ability on modelling hourly DO concentration. During training, the MLPNN (M5) performs slightly better than the others. Also, in the validation and testing phases, the MLPNN (M5) outperforms all other models in terms of various performance criteria. In conclusion, the M5 model is the best developed model for modelling DO concentration. The comparison between the RBFNN and MLPNN clearly show the differences between the two models which favour the MLPNN in testing phase, while the RBFNN outperform MLPNN in the training and validation phases, thereby establishing the superiority of the RBFNN models. In the training phase, the RBFNN M5 improved the MLPNN M5 of about 1.5 % regarding the CC value. In addition, in the validation phase as seen in Tables 5, the RBFNN M5 improved the MLPNN M5 of about 0.2 % regarding the CC value. In the testing phase, the MLPNN M5 improved the RBFNN M5 of about 1.7 % regarding the CC value. The Scatterplots and comparison of observed and calculated values of DO in the Training, Validation and Testing phase, respectively, are shown in Fig. 7 for the MLPNN M5 model, in the USGS 421401121480900 Station.

Fig. 7
figure 7

Results with MLPNN model for USGS 421401121480900 station. Scatterplots and comparison of observed and calculated series of DO in the: a training, b validation and c testing phase, respectively

Forecasting DO concentration

Notwithstanding, the importance of the developed models for estimating DO, it should be noted that they are linked to the water quality variables, and the models cannot be done and if they are done they can only be done with available, timely and reliable water quality data (Heddam 2016a). It would be interesting to investigate the capabilities of the proposed models (RBFNN and MLPNN) for forecasting DO at different level in advance using only the time series of DO and without the need for water quality variables as input to the models. Attempts to do this have not been entirely investigated in the past, and the present study is one of the first studies presented in the literature for forecasting DO time series. The statistics of the hourly DO Time series used for develo** the forecasting models are shown in Table 6. We have developed six forecasting models called FM1 to FM6 using the same input data and have different output, the structure are shown in Table 7 below. It can be seen from Table 7; the six forecasting models have the same input structure: the input variables present the previously measured DO (t  3, t  2, t  1 and t) and the output variable corresponds to the DO at time t + 1, t + 12, t + 24, t + 48, t + 72 and t + 168, where DO (t) corresponds to the DO at time t. Figure 8 shows how the input and output data sets are created. According to Table 7, all the six developed models are basically approximators of the general equation, where n is the next time step:

$$DO \, (t + n) = f\left[ {DO \, (t) + DO \, (t - 1) + DO \, (t - 2) + DO \, (t - 3)} \right].$$
(12)
Table 6 Hourly statistical parameters of data set for forecasting models
Table 7 Combinations of input variables considered in develo** forecasting models
Fig. 8
figure 8

Illustration of data input and output format for the MLPNN and RBFNN forecasting models and the multi hours in advances forecasting scheme

Table 8 represents the MLPNN and RBFNN results of DO forecasting in different phases, several hours in advance, for USGS421401121480900 station. On analyzing Table 8, it is apparent that the performance of the RBFNN and MLPNN are good until 72 h in advance, since the CC are rather than 0.92 in the validation phase. In the testing phase the two models have a CC equals to 0.74. From the Table 8 it can be observed that the model FM1 whose output is the DO at (t + 1) performed better than the others models in the training, validation, and testing phases. For the RBFNN model and according to Table 8, in the training phase, the values of CC, RMSE and MAE, ranged from 0.796 to 0.989, 0.322 to 1.356, and 0.220 to 0.948, respectively. In addition, in the validation phase, the values of CC, RMSE, and MAE ranged from 0.753 to 0.997, 0.089 to 0.807, and 0.065 to 0.649, respectively. Finally, in the testing phase, the values of CC, RMSE, and MAE ranged from 0.533 to 0.997, 0.071 to 0.704, and 0.054 to 0.606, respectively. It may be seen from Table 8, the results obtained using the MLPNN models are generally similar to those obtained by the RBFNN models, and there were some noticeable differences. In the training phase, MLPNN models FM1, FM3, FM4 and FM5 are superior to the same RBFNN models and the difference was more marked amongst the CC coefficients. From the Table 8 it can be revealed that RBFNN and MLPNN FM1 models with 1 h ahead obtained the best statistics of CC (0.997 and 0.997), RMSE (0.067 mg/L and 0.071 mg/L), and MAE (0.052 mg/L and 0.054 mg/L) respectively, in the testing phase. An important conclusion from the results obtained is that that increasing the forecasting horizon from (1) to (168) h ahead decreases the model accuracy. The CC decreases from 0.99 to 0.53 in testing phase and the RMSE and MAE increases from (0.067 and 0.052) to (0.704 and 0.606) always in the testing phase.

Table 8 Performances of the MLPNN and RBFNN forecasting models in different phases several hours in advance for USGS 421401121480900 station

Table 9 represents the MLPNN and RBFNN results of DO forecasting in different phases, for USGS 421015121471800 station. In general, CC were high (0.79 to 0.99), RMSE and MAE are low (0.450 mg/L to 2.143 mg/L, 0.253 mg/L to 1.581 mg/L) in the training phase. From the Table 9 it can be observed that the model FM1 either for the MLPNN or RBFNN, performed better than the others models in the training, validation, or testing phases. Overall, in the testing phase, RBFNN have higher CC value and lower RMSE and MAE values than those of the MLPNN. According to Table 9, the FM6 model is good in the training phase, but very poor with very low CC and very high RMSE and MAE in the validation and testing phases. The CC ranged from (0.111 to 0.224), RMSE from (1.210 to 1.171 mg/L), and MAE from (0.934 to 0.895 mg/L). An important conclusion from the results obtained is that that increasing the forecasting horizon from (1) to (168) hours ahead decreases the model accuracy. The CC decreases from 0.97 to 0.111 in testing phase and the RMSE and MAE increases from (0.287 and 0.192) to (1.210 and 0.934) always in the testing phase. Finally, two points have to be highlighted: first, we think it is very important that we should investigate the capabilities of the proposed models using a long-term data set rather than 1 year. The second and final point to be stated is that the proposed models are a powerful tool for forecasting DO up to 72 h (3 days) ahead with good accuracy.

Table 9 Performances of the MLPNN and RBFNN forecasting models in different phases several hours in advance for USGS 421015121471800 station

Conclusion

In this study, two well know artificial intelligences techniques namely, RBFNN and MLPNN were developed for modeling and forecasting DO using water quality variables and antecedent values of DO, respectively. Given a set of training data, the two models provide a good and powerful tool for estimating DO. To demonstrate the usefulness of the models, we used data obtained from two particular stations operate by the USGS data survey. In the modeling phase, we developed five models with different combinations of input variables and we select the model that has the best performance according to three performances criteria: RMSE, MAE and CC. As a result we have obtained a CC ranged from 0.82 to 0.99, 0.82 to 0.98 and 0.81 to 0.97, in the training, validation and testing phase respectively and the best results are obtained with the model that contains all candidate input variables:water pH, TE, SC, and SD. Also, it is important to note that using only two variables as inputs, which are TE and pH, we have obtained a high CC approximately 0.96 in the testing phase that is very promising and encouraging. In the forecasting phase, we compared six models using the same input structure and we have attempted, however, to forecast, as much as possible the DO at different hours in advance, from 1 h in advance up to 168 h (7 days) in advance. The results obtained are very promising, and we demonstrated that until 72 h in advance we have a high CC approximately 0.92 in the validation phase and 0.74 in the testing phase. At 168 h (7 days) in advance we obtained a low result with a CC close to 0.54. The reasons for this low level of results can be explained in a number of ways. Firstly, the length of the data set is probably insufficient and a data base that covers more than 1 year is necessary, this is in order to include in the validation and testing phases the all four seasons. Furthermore, it may also help if we applied data-preprocessing techniques like wavelet multi-resolution analysis, coupled with artificial intelligences techniques, especially for forecasting longer hours in advance. In the future, further research is necessary to improve the prediction accuracy of the proposed models and it would be interesting to evaluate the applied models for a long period rather than 1 year.