1 Introduction

The increase in the world population, industrialization, and urbanization is coupled with a surge in the demand for a quality life. Thus, in order to fulfill the requirements of human beings, various industries were established. Aside from the benefits of those industries, several problems are associated with them such as the production of wastewater [23]. Effluent discharges (metallic and nonmetallic pollutant-bearing effluents) are considered as major polluters of the ecosystems, surface water, and ground water with their interrelated negative impacts on food quality and health. The toxic and carcinogenic properties of these pollutants can shatter human health and lessen water quality [4].

The century's greatest challenge is the shortage of fresh water resources [3]. The United Nations General Assembly Resolution 64/292 defined access to safe water as an absolute human right and listed it in the hub of the sustainable development goals [1]. Thus maintaining water quality so as to avoid adverse effects on human health or fauna and flora is crucial. As a matter of fact, a deteriorated water quality will cause extensive economic implications on several industries such as agriculture and recreation. Protection of the available water resources is of utmost importance especially amidst the existing deficiency in the sufficiently available water resources mainly in arid and semi-arid areas. The main water pollution causing agents are attributed to municipal solid waste and landfills. Various improvements were incorporated in the municipal solid waste management actions ranging from reduction to recycling, composting and incineration. However, landfilling of municipal solid waste is still considered as the final and most adopted disposal option. Thus, numerous countries worldwide are still discarding their wastes in landfills. Greece, Croatia, Cyprus, Latvia, Malta and Romania countries pertaining to the European Union landfilled around 75% of their wastes year 2013. Furthermore, more than half of USA's waste (53.8%) were disposed of in landfills year 2012. Around 100% of the waste generated in low to medium income develo** countries is thrust in landfills.

In Lebanon, Naameh Landfill was the main disposal facility that received the waste of around 2.2 million people and covered greater Beirut area. Naameh sanitary landfill was initiated since year 1997, is located in the Shouf caza in an old quarry site. A significant portion (>70%) of the residual waste from the sorting and composting plants are baled, wrapped and disposed of at Naameh Landfill. This landfill that was intended to function for a period of 10 years stayed in operation until year 2015. As a matter of fact, the closure of this landfill has led to a serious waste crisis whereby piles of municipal wastes were encountered in the streets and the forests for a period which exceeded one year.

This disposal method can have several detrimental effects because of unplanned development of landfill areas that might lack appropriate engineering controls or even as a consequence of inadequate oversaturation. This might lead to the formation of leachate that can escape containment and subsequently infiltrate to the groundwater. Thus, the migration of the landfill leachate into groundwater poses a serious environmental threat for both engineered and uncontrolled landfill sites. The landfill leakage environmental effect was detected several times even in sanitary engineered landfills with associated geomembrane layers. Thus, assessment of the potential contamination risk posed to groundwater as a result of landfill operation is of prime importance.

The usual evaluation of groundwater contamination was through analysis of the various physiochemical characteristics, water quality indices, or suggested water quality standards. However, it is of extreme difficulty to discern the extent of groundwater contamination and to obtain significant conclusions from the complex pattern of the various water parameters without advanced statistical techniques. Hence, a number of clustering methods are frequently used for the assessment of spatial and temporal variations in water quality, including principal component analysis (PCA), cluster analysis (CA), factor analysis (FA), and self-organizing map (SOM) [14]. In order to detect the non-linear relationship between within this study two techniques (i.e., CA and SOM) were applied simultaneously to provide fundamental research on the possible influences of Naameh Landfill on the groundwater quality within its vicinity. This was performed via conduction of analysis for a large spectrum of physiochemical, metal and microbiological parameters from five allocated groundwater wells. The principal components (PCs) technique is a reflection of the linear relationship between parameters and it allows for interpretation of the most significant ones. Furthermore, its application in interpretation of water quality parameters was demonstrated as successful in several studies [9, 28]. In order to broaden the understanding scope, application of a non-linear clustering method known as SOM was applied. However, only few studies [5, 13, 21] used the SOM technique for identifying the effects of the landfill on the water quality surrounding it. The combined usage of both methods will enable the characterization of highly complex water quality data.

Lebanon is suffering from a lengthened environmental and solid waste crisis that is threatening the water resources and the public health of its communities. There are no studies on groundwater contamination in landfills in Lebanon using artificial neural network except for one study. This study-encompassed removal of mercury from wastewater using micellar-enhanced ultrafiltration. Experimental and artificial neural network modeling were applied for this purpose [30]. Another study was conducted by [8] which is part of a public participatory research project that assessed the impact of solid waste disposal practices on water in villages surrounding Naameh landfill. The results obtained indicated the presence of organic compounds at low levels as well as zinc which might be considered as an indicator of contamination by leachate from landfills. This result necessitates the importance of continuous monitoring of Naameh landfill for a smooth continuity.

The main objective of this study is to develop a conceptual groundwater model to predict the next period values of water quality parameters such as Mercury using Artificial Neural Network (ANN) Model. This will allow for an adequate design of the groundwater quality monitoring network and will reveal possible arising contamination risks. Thus, the risk posed to groundwater after stoppage of Naameh landfill's operation (2015) and the finalization of the cap** process (year 2020) will be assessed [26]. This study demonstrated the application of artificial neural networks (ANNs) to evaluate the water quality of the underground water in Naameh Landfill through predicting some of the water quality parameters. Additionally, the aim of this work is to examine whether it is possible to predict one of the water quality parameters in Naameh Landfill. Thus, the current study provides a thorough insight on the status of groundwater in the landfill in the past, present and upcoming future in order to track possible contamination especially in the ongoing solid waste crisis in Lebanon and absence of solutions . This study provides solid and unique data which can be considered as a unique reference for building sustainable solutions upon it. This model can be used as a reference for proposing an adequate scientific approach for closure of MSW landfill sites.

2 Methods and materials

2.1 Study area

Naameh Sanitary Landfill, situated in the Shouf caza was a deserted old quarry site which was transformed into a municipal solid waste disposal facility year 1998. It is located across a seasonal watercourse 16 km south of Beirut (Lebanon) and 4km from the coastline at an average altitude of 250 m [10, 24].

The area of this study is characterized by a semi- arid climate and it experiences the four seasons (winter, spring, summer and fall). The temperatures in the summer season (June-August) range between 19°C and 33°C whereas a range of 8°C and 23°C is detected in the winter season (December-February) [11]. The landfill was originally designed to cover an area of capacity 120,000 m2, for a period of 10 years and to receive 2 million tons of waste. However, it underwent a series of extensions which rendered it to stay in operation for an additional 8 years. Thus, the landfill held 14.8 million tons of waste and covered around 300,000 m2 of land [6].

Naameh Landfill is the only sanitary engineered landfill with particular containment technologies consisting of several layers of liners, leachate collection and treatment system, surface water drainage system, gas management system and a final cap** system Fig. 1. One of the major concerns of landfill pollution effects is the consequential groundwater contamination as a result of leachate infiltration. Regarding the main groundwater table pertaining to the landfill, it is sited at a depth of 250 to 350 m underneath ground level with a westward flow direction towards the Mediterranean Sea [10].

Fig. 1
figure 1

Location map of Naameh Landfill

In order to obtain a quantitative and qualitative modeling of the groundwater system, acquisition of large data set on the physiochemical, metal and microbiological parameters of the water is required for adequate prediction of the hydrological variables and comprehensive understanding of the physical processes occurring in the landfill [19].

Thus groundwater samples were collected and analyzed from the period ranging between 2011 and 2019 from five representative wells at upstream and downstream locations. The sampling sites locations as well as coordinates are represented in the below Fig. 2 & Table 1.

Table 1 Coordinates of sampling locations
Fig. 2
figure 2

Topographic map representing the sampled points

Samples were regularly collected and on a seasonal basis in the aim of monitoring any variation between the selected parameters. Thus various physiochemical parameters were tested including pH, Conductivity (EC), Biological Oxygen Demand (BOD), Chemical Oxygen Demand (COD), alkalinity, Chloride (Cl-), Ammonia-N (NH3-N), Orthophosphate (PO43-), Cyanide (CN), Total Oxidized Nitrogen (TON), Total Suspended Solids (TSS), and Sulphate (SO42-). The metal parameters tested on a bi-annual basis are: Iron (Fe), Lead (Pb), Chromium (Cr), Cadmium (Cd), Calcium (Ca), Copper (Cu), Zinc (Zn), Nickel (Ni) , Arsenic (As), Manganese (Mn), Magnesium (Mg), Potassium (K), Sodium (Na), and Mercury (Hg).

As for the microbiological analysis, two main parameters were selected referred to as total coliform and fecal coliform. In the laboratory, samples were analyzed according to established methods and procedures derived from the "Standard Methods for the Examination of Water and Wastewater" [24]. The method of analysis along with the reference method are demonstrated in the below Table 2.

Table 2 Analysis Methods of the parameters SM for Standard Methods accredited for the Examination of Water and Wastewater, HACH is for certified procedures accepted by the USEPA for use in the analysis of water and wastewater

2.2 Artificial Neural Network predictive model

2.2.1 Development of ANN predictive models

It is significant to assume a systematic methodology for develo** a neural network model, taking into account a number of factors such as: data preparation, determination of suitable model inputs, appropriate network architecture, and training &model testing [16]. For this work, the feed-forward neural networks were considered and applied to predict the water quality parameters of underground water in Naameh Landfill. However, this study presented two types of methods according to the different training algorithms. They are known as multi-layer perceptron (MLP) and radial basis function (RBF) which were dually trained on the observed data to develop a method to predict the underground water wells parameters. The two neural networks have no feedback connection between layers and no connections between units in the same layer. They are characterized by the following: working in a supervised manner, classifiying and solving problems in an adequately, easy to use, work as universal approximations, have good nonlinearity capabilities and are extensively used in the feed forward network family. In this study for the training of MLP to predict water quality parameters the performance of the back propagation algorithm has been enhanced by incorporating the Levenberg Marquadrt (LM) algorithm into it. The LM algorithm used is gradient based and deterministic local optimization algorithm. When employed to train the MLP model, the advantage of the Levenberg Marquardt over the backpropagation algorithm is that it can provide a faster (second-order) convergence rate and holds comparative stability [22]. Compared to Quasi-Newton procedures, Levenberg Marquardt algorithm was designed to reach second order training speed without any need to determine the Hessian matrix. The LM incorporation into the backpropagation algorithm not only improves the training time but also provides better performance in terms of training precision and convergence properties [27]. Additionally, the Orthogonal Least Squares (OLS) algorithm is used to train RBF network which was founded by [7]. The steps that are followed during the development of ANN models for predicting the water quality parameters in Naameh Landfill are described briefly in the following subsections Fig. 3:

Fig. 3
figure 3

Methodology of our study

2.2.2 Collection of data

The water quality data were generated from five selected groundwater points in Naameh Landfill for a period of eleven years in order to predict water quality parameters using ANN technology.

All the generated data (89 readings) during this study were combined in one set to test the possibility of develo** the neural network models for predicting the water quality parameters including Mercury concentrations.

2.2.3 Performance criteria selection

It is very important to describe the performance criteria for determining the model before structural design is made. In this study prediction accuracy was used as a performance criteria during the process of building ANN predictive models. A number of methods for the prediction accuracy were suggested in several studies [15, 17, 18, 29]

2.2.4 Data division

It is a common process to divide the collected data into two subsets referred to as training and testing sets. It is worthnote mentioning that training and testing data sets belong to the same population.

In this work, about 80% of the collected data was employed for training and the remaining data used for testing developed predictive models.

2.2.5 Processing of data

As an initial step of the underground water quality parameters prediction, inlet water quality data of selected sites, over a period of eleven years beginning from February 2011 to February 2019 was made. A total of 5 sites were selected for modeling purposes. The main chosen water quality variables included pH, EC, TDS, calcium, magnesium and TON. Due to the fact that the input and output variables have extremely diverse orders of magnitude, it is recommended to rescale the data. In this way, predictions that are more reliable can be made. The normalization of data is usually done with {0, 1} [25]. However, in this study the variables were rescaled to be counted within the interval {0, 1} which could cover all differences of the data sets used for building ANN predictive models.

2.2.6 Data training

The objective of training process is to achieve accurate ANN model structure. In training process, the choice of the transfer function, learning rate, momentum, exit condition setting, root mean square error (RMSE) and validation of the model are needed.

In this study, training and testing of ANN models for the water quality variables prediction were made using neural network toolbox in the MATLAB. The MLP network is trained by using the back propagation incorporated with Levenberg Marquardt algorithm. The tangent hyperbolic function is used as activation function in the hidden layer neurons. The linear activation function is used in the output layer neurons. The RBF network is trained by using the back propagation incorporated with the Orthogonal Least Squares algorithm and the Gaussian radial basis function is used as activation function in the hidden layer. The linear activation function is used in the output layer [27].

2.2.7 Models testing

When the network training is accomplished, the trained network performance must be tested by using unknown data set. It is important that the testing data set should not have been used as a part during data sets training process. After testing the predictive model with unknown data set, there is a probability that the two data sets are not typical of the same population or that the predictive model is over fitted as documented. In this case there will be a large variance in the error achieved when the tested set is used in comparison with the trained data set [18].

2.2.8 Developed ANN predictive models procedure

The description and details of neural networks approach can be found elsewhere [12, 20]. ANN model based on the back propagation algorithm and a model-based approach using process simulation are two approaches used in statistical approaches. The comparison of results of the neural network model and simulation models showed that the two models are capable of accuracy predicting, but the accuracy of a neural network model depends on: proper selection of input variables and the broad range of data with which the network is trained [2]. The neural network generally consists of at least three or more layers, which contain input layer, output layer and a number of hidden layers.

Two types of feed forward networks were used to construct the ANN predictive models. These are known as MLP and RBF neural networks; both were trained on the collected data for building predictive models for water quality parameters foretelling. Before running all models, data sets were normalized to be included within the interval [0, 1]. Six scenarios were used to train MLP and RBF networks to choose the best fit model for predicting water quality parameters with emphasis on Mercury. The used input parameters for the trained six scenarios of all developed predictive models in this study are given as bellow:

  • Scenario 1: pH, EC, TDS, calcium, magnesium and TON

  • Scenario 2: pH, EC, TDS, magnesium and TON

  • Scenario 3: pH, EC, TDS and TON

  • Scenario 4: pH, EC and TON

  • Scenario 5: pH and EC

  • Scenario 6: EC

2.2.9 Statistical analysis tools

The water quality data was generated and was being used to develop ANN predictive models to predict the water quality for assessing the quality of underground water in Naameh Landfill. The generated data was entered as Microsoft Excel sheets, uploaded to Statistical Package for Social Sciences (SPSS) and analyzed using Min, Max, mean, standard deviation tools. In addition the Pearson correlation coefficient (a measure of linear association) was used to measure the linear association among the selected parameters.

The training and testing of the developed ANN models were carried out using neural network toolbox in the MATLAB. Two types of feed forward networks were used. They are known as multilayer perceptron and radial basis function neural networks. The root mean square error (RMSE), mean absolute error (MAE), error percentage and correlation were used for training and testing models.

3 Results and discussion

Mercury concentration levels ranged from 0.06 μg/l to 0.1 μg/l for training data set and from 0.05 μg/l to 0.09 μg/l for testing data set. The average value among all groundwater points for both training and testing data sets was found to range between 0.08 μg/l and 0.07g/l respectively.

The statistical analysis showed positive moderate correlation between EC, TDS and TON for both training and testing data sets. Also r values were found to be (0.43, 0.56); (0.43, 0.56); (0.51, 0.49) respectively. A negative moderate correlation was detected between Mercury and pH for both training and testing data sets whereby r-values were (-0.51 and -0.58) correspondingly. A positive weak correlation was observed between Mercury, calcium as well as magnesium for both training and testing sets. Their r values revealed the following ranges: (0.23, 0.37); (0.27, 0.33); (0.30, 0.39).

To predict the concentrations of Mercury in the groundwater of Naameh Landfill, feed forward MLP and RBF neural networks were used.

The results achieved from MLP, 89 trained with a number of different algorithms showed that the created MLP network which trained with back propagation incorporated with LM algorithm is the best fit model for predicting Mercury concentrations in the groundwater. The MLP one hidden layer and two hidden layers neural networks prediction results were compared with the RBF neural network. The trained MLP and RBF networks performances are presented in Figs. 4, 5 and 6 respectively.

Fig. 4
figure 4

Mercury MLP-one hidden layer Model training performance

Fig. 5
figure 5

Mercury MLP-two hidden layers Model training performance

Fig. 6
figure 6

Mercury RBF Model training performance

It can be seen from Figs. 4, 5 and 6 that the created MLP one hidden layer network performance is slightly better than two hidden layers and RBF network.

Evaluation of the achieved prediction results of Mercury concentrations from all tested models for the six scenarios developed during this study are described in Table 3. It can be seen from this table that the best performance for testing data was achieved at scenario 4. The input parameters employed for training the network included PH, EC and TON.

Table 3 Evaluation of Mercury ANN developed models for testing data set

Table 4 presents the coefficient correlations, performance (RMSE), MAE and error range (%) between the observed and predicted values of mercury concentrations using MLP and RBF for training and testing data sets of the best developed models.

Table 4 Statistical summary for mercury best developed models prediction results

The root mean squared error (RMSE) and mean absolute error (MAE) were calculated for training and testing data sets and then compared with the root mean squared error among all developed networks. RMSE, MAE, correlation coefficients and errors percentage range for both the training and testing data sets are shown in Table 4. The root mean squared error for the testing data sets was found to be closer to that of the training data set. This is a clear indication of the similarity of the predictions between both testing and training data sets.

The prediction results achieved by MLP used several different algorithms. It revealed that the developed MLP network trained with backpropagation incorporated with LM algorithm is the most fitting model for predicting mercury concentrations in the underground water quality of Naameh Landfill.

The MLP and RBF networks’ performances were verified with different unknown data sets. Furthermore, the prediction results displayed satisfactory performance as compared to the other developed models mentioned earlier.

The correlations between the actual and predicted values of Mercury for MLP and RBF models training data set were found to be strong. Prediction results of MLP one hidden layer and two hidden layers were found to be better than RBF model. Also, coefficients correlation values were [0.88-0.90], [0.886-0.912] and [0.791-0.810] respectively as (Figs. 7, 8 and 9).

Fig. 7
figure 7

Mercury one hidden MLP model regression for training and testing data

Fig. 8
figure 8

Mercury two hidden MLP model regression for training and testing data

Fig. 9
figure 9

Mercury RBF model regression for training and testing data

The prediction results of MLP model were found to be better than RBF in both training and testing data sets. The achieved results demonstrated that the developed MLP and RBF (neural network models) have satisfactory competence for predicting Mercury concentrations of groundwater in Naameh Landfill. The predicted values of Mercury concentrations obtained from all tested models were found to be similar to the measured values.

The comparisons of MLP one hidden layer and two hidden layers model prediction results of training and testing data sets along with the RBF model predictions is represented in the Figs. 10 and 11. It can be seen that the performances of MLP models are better than RBF models for forecasting Mercury concentrations.

Fig. 10
figure 10

Comparison of Mercury MLP-1, MLP-2 and RBF Models-Training prediction results

Fig. 11
figure 11

Comparison of Mercury MLP-1, MLP-2 and RBF Models-Testing prediction results

The Mercury model showed rundown prediction results. This might be attributed to the weak correlation coefficients between the selected input and output data. Further improvements can be made to upgrade the performance of ANN for predicting Mercury levels. This purpose can be achieved via selection of more appropriate water quality parameters.

Further analysis about the sensitivity of the results against the input factors were performed on the results of ANN estimation for the best network. Therefore, to determine the sensitivity and impact of different input factors, MLP network with one hidden layer and trained with back propagation incorporated with LM algorithm.

Prediction of groundwater Mercury concentrations was achieved by selection of six parameters and by using different ANN models. The best network structure for estimating Mercury concentrations was determined. To study the effects of input parameters on the network prediction results and to determine parameters with less influence on concentration, sensitivity to each of the selected parameters was studied in Fig. 12. It can be seen that TON> EC > pH> Ca> Mg> TDS.

Fig. 12
figure 12

Sensitivity of each input index for prediction of Mercury index

This is a clear indication that TON is the most dominant factor followed by EC, pH, Ca, Mg, and TDS subsequently. This result proves the adequacy of our selection of scenario 4, which incorporated TON, pH and EC as the input variables.

4 Conclusion

As a conclusion, it can be said that the prediction results illustrated the fact that both types of networks are extremely efficient in prediction of Mercury concentrations in the groundwater of our study area. Moreover, the results revealed that there will be mercury residues for the upcoming two years even in the absence of discharge as the landfill is already capped and no more wastes are entering.

The models were developed based on the data collected from 5 sites in Naameh Landfill. Six scenarios were used to train MLP and RBF networks for choosing the best model for predicting mercury levels in underground Water. The best result for predicting mercury concentrations was achieved at scenario four where the inputs included: pH, EC, TON by using MLP one hidden layer neural network(r= 0.92; RMSE = 0.44).

The developed MLP one hidden layer and two hidden layers networks were compared with RBF neural networks and it was found that the prediction results of MLP neural networks were slightly better than the RBF neural networks. Further sensitivity analysis of the input factors effects on the developed ANN models were made for the best network.

Sensitivity degree or impact of each of the input factors on the outcomes of the training and testing data for mercury model results are: TON> EC > pH> Ca> Mg> TDS. It can be concluded that TON has the highest influence on the developed model.

According to our results, it can be inferred that the concentrations of Mercury can be estimated approximately by using ANN model and it is possible to detect the prediction concentrations by determining the best-fit model. The success of this model allows the end user to utilize it for a broader spectrum of metals that might have toxic characteristics similar to Mercury such as Arsenic and Nickel.

In other words, the developed ANN model in this study may be used as a new predictive tool for assessing the significances of a number of water resources management scenarios. Utilizing the developed ANN models as a decision support tool could be very useful for preparing water management policies and setting-up a suitable management scenario in Naameh Landfill.