Abstract
Streamflow prediction is a key variable for water resources management. It becomes more important in semi-arid regions such as the Tensift river basin in Morocco, where water resources are facing a severe drought and the demand is continuously increasing. The present analysis focuses on evaluating Machine Learning techniques, namely support vector regression (SVR) and Random Forest (RF) against the multiple linear regression (MLR) for daily streamflow forecasting in the mountainous sub-basin of Rheraya between 2003 and 2016. The results show that SVR performed best, followed by RF and MLR. In measurable terms and regarding mean performance, SVR exhibited the higher Nash–Sutcliffe efficiency score (NSE = 0.59) and a lower root mean squared error (RMSE = 1.18 \(\text {m}^3\,\text {s}^{-1}\)) compared to RF (NSE = 0.53, RMSE = 1.18 \(\text {m}^3\,\text {s}^{-1}\)) and MLR (NSE = 0.54, RMSE = 1.01 \(\text {m}^3\,\text {s}^{-1}\)). Furthermore,the available time series was too short to properly capture the full range of streamflow variability, which reduced the prediction performance outside of the calibration conditions. These findings suggest that ML algorithms, particularly SVR, can provide accurate streamflow estimation useful for water resources management when trained on a representative period. The results highlight the capacity of Machine Learning algorithms, specifically SVR, to augment streamflow prediction for enhanced water resource management in arid regions.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
For numerous basins in the Mediterranean semi-arid regions, mountains are a major ‘water tower’ for the surrounding plains [1, 2] where agriculture plays a crucial role in the economy. This is the case in the Moroccan High Atlas Mountains, which are considered as the headwaters for the surrounding plains where agricultural, industrial, and touristic activities consume more than 90% of the available water resources [3, 4]. Previous studies indicated that this area experienced pronounced drought events during the last decades [5, 6]. Additionally, a recent study has projected a significant trend toward drier conditions in this region towards the end of the century [7]. Air temperatures have also been continuously increasing since 1989 [8] and the demand for water is expanding due to the growth of population and socio-economic activities. As such, water resources in the region are threatened due to their vulnerability to climate change and the low storage capacity and absence of artificial reservoirs for water regularization in some basins [9]. Consequently, the need for effective water resources management in the headwater region of the Moroccan High Atlas region is an urgent priority.
It is commonly known within the water management community that increasing water management efficiency based on streamflow forecasts is critical for reducing the impact of water scarcity [10]. Streamflow is a key variable in the water balance and quantifying the amount of water supplied in the watershed could be a first step toward better water management [11], but it remains challenging. At the scale of the Tensift basin, several previous works have used conceptual, process-based hydrological models to simulate and predict runoff and resulting streamflow, such as the Soil and Water Assessment Tool model (SWAT) [9], the Snowmelt Runoff Model (SRM) [12] and the Génie Rural à 2 paramètres model (GR2M) [8, 13] in the Rheraya sub-basin in Tensift.
The High Atlas is characterized by a high spatial and temporal heterogeneity of hydrological processes [9, 14,15,16], data scarcity and complex runoff generation mechanisms, which makes the application of hydrological models for streamflow simulation a challenging task [3]. As such, Artificial Intelligence (AI)-based data-driven techniques could be a promising alternative to process-based models for hydrological simulations [17]. Various AI-based techniques have been widely used to forecast streamflow [18,19,20,21,22,23,24]. Their main objective is to decrease the estimated error between the simulated and the target variable [25]. Artificial Neural Network (ANN) techniques have been commonly applied for runoff modelling and achieved good results [26,27,28]. [29] successfully utilized the multilayer perceptron (MLP) and the radial basis neural network (RBFNN) to predict streamflow at multiple gauging stations in the agricultural Eucha watershed located in north-west Arkansas and north-east Oklahoma. In the ungauged River of Luvuvhu in South Africa, [30] used RBFNN to perform 1-day forecasts of streamflow. Due to data-scarcity that may occur in some develo** countries, the application of data-intensive models may be difficult, and [30] demonstrated that artificial neural networks are effective for streamflow forecasting in this context. [31] demonstrated the effectiveness of ANNs for monthly streamflow prediction in poorly gauge catchments as well. In parallel, the Support Vector Machine (SVM) has been popularized as a new statistical learning method and verified to be a robust and efficient algorithm for both classification and regression [32]. SVMs methods are based on map** the input data set into high dimensional feature space to resolve classification problems and reproduce the relationship between the input variables and the target. [33] studied the use of SVM for daily rainfall-runoff modeling and found it to provide accurate prediction of streamflow. Seasonal and hourly multi-scale streamflow prediction was conducted by [34] using the SVM method which showed a promising performance. [35, 36] assessed SVM’s ability to forecast monthly streamflow. They concluded that SVM, when combined with techniques for input selection and tuning hyper-parameters, provides an effective streamflow prediction at a monthly step. In semi-arid environments, [37] examined three different data-driven methods for forecasting river flow: ANN, SVM and Adaptive neuro fuzzy inference system (ANFIS). Their study showed that SVM performed better than ANN and ANFIS. The aforementioned studies proved that AI techniques can offer a viable alternative to process-based hydrological models due to their capability of handling the non-linear and non-stationary nature of the hydrological processes [38]. Furthermore, the integration of Geographic Information Systems (GIS) with Artificial Intelligence (AI) models has emerged as a revolutionary approach in hydrological research, enabling improved spatial analysis and visualization capabilities. GIS serves as a robust instrument for evaluating and managing water resources by facilitating the spatial map** of watershed characteristics, significantly enhancing AI-driven modeling efforts. For instance, [39] illustrated the effectiveness of GIS and remote sensing in assessing flood impact through morphometric parameters in the Niger Delta region. Additionally, the rise of hybrid AI models presents promising progress in capturing the intricate dynamics of hydrological systems. These models combine various AI techniques or incorporate AI with process-based models, potentially enhancing the precision of streamflow forecasts. [40] investigated an approach employing artificial neural networks combined with a hybrid technique utilizing wavelet transform to evaluate the water quality of the Tallo River in Indonesia, demonstrating the efficacy of hybrid AI approaches. Similarly, [41] devised an enhanced multi-stage genetic programming model for streamflow prediction, emphasizing the potential of hybrid models in improving prediction accuracy and dependability compared to conventional models alone. These advancements in GIS integration and hybrid AI models emphasize the evolving terrain of hydrological modeling, presenting novel pathways for precise, dependable, and comprehensive water resource management solutions. As AI algorithm become widely available, their flexibility and accessibility make them worth to be evaluated for simulating hydrological response.
Considering the main challenges related to water resources within the Moroccan context, this research aims to make a valuable contribution to the field of water management. Firstly, the objective is to understand how ML techniques could simulate daily streamflow in the semi-arid and mountainous region of Tensift. Secondly, the goal is to select the optimal configuration that can serve as a foundation for the future research to develop a hybrid modelling framework. Two ML techniques were selected for their robustness and effectiveness in addressing nonlinear relationships, SVR and RF, and were subsequently compared against the conventional Multiple Linear Regression (MLR) model. The study area focused on Rheraya sub-basin giving its unique climatic conditions and limited hydrological data, allowing for a deep exploration of machine learning’s potential in water management. Given the high temporal variability of the hydrological processes and resulting streamflow in this region, the temporal stability of these models was also assessed. The influence of the variable hydroclimatic conditions on the stability of the performance of the tested AI techniques was further analyzed and discussed.
2 Materials and methods
2.1 Study area
This study was conducted in the Rheraya sub-basin located in the High Atlas Mountains in Morocco. The watershed covers a surface area of 228 km\(^2\) and is characterized by a large altitudinal gradient ranging from 1060 m a.s.l to 4167 m a.s.l at the Jbel Toubkal summit, the tallest summit of North Africa (Fig. 1). The hydro-climatic context is characterized by a strong heterogeneity both in space and time. The mean annual precipitation increases with elevation by an average of 166 mm km\(^{-1}\) [42]. The mean temperature lapse rate varies between \(-\) 4.39 and \(-\) 4.85\(^{\circ }\)C km\(^{-1}\) annually, from \(-\) 3.67\(^{\circ }\)C to \(-\) 5.21\(^{\circ }\)C \(\text{km}^{-1}\) monthly and from \(-\) 2.75\(^{\circ }\)C to \(-\) 7.1\(^{\circ }\)C \(km^{-1}\) daily [12, 42]. The Rheraya sub-basin is an appropriate site for this study due to its relatively dense network of measurement stations compared to other basins in Tensift, and its urgent need for water management. The Rheraya sub-basin is an appropriate site for this study due to its relatively dense network of measurement stations compared to other basins in Tensift, and its urgent need for water management.
2.2 Dataset
The datasets used in the present study are divided into three categories: (i) in-situ data, (ii) simulated data and (iii) satellite data, spanning a 13-year period from 1st September 2003 to 31st August 2016 (Table 1). The daily observations of streamflow records between 2003 and 2016 were supplied by the Tensift hydraulic water agency (ABHT: https://www.eau-tensift.net). Figure 2 illustrates the seasonal variability of the daily-observed streamflow and highlights the presence of several extreme values in each hydrological year. Meteorological data was provided by the International Mixt Laboratory (LMI-TREMA: LMI TREMA - LMI - Télédétection et Ressources en Eau en Méditerranée semi-Aride (lmi-trema.ma)) [43]. Table 2 provides a comprehensive overview of the statistical values of streamflow and its predictors from 2003 to 2016.The predictors were collected from ten stations located around the watershed (Fig. 1). All meteorological series were subject to several pre-processing steps, including: (i) aggregating all variable to daily averages; (ii) deriving lapse rates of air temperature, relative humidity, and precipitation; then, (iii) spatial distribution of the daily variables to the whole Rheraya catchment by combining an altitudinal lapse rate and spatial interpolation of temperature and precipitation. Further details about these processing steps are provided by [44]. Snowmelt and snow water equivalent (SWE) were simulated by the classical temperature-index model (TI) using air temperature (Ta) as a unique index of melt energy. The model has a single coefficient, called the “degree-day factor” (DDF, in mm \(^{\circ }\)C\(^{-1}\) d\(^{-1}\)); which was previously calibrated by [45]. The snow cover area (SCA) was calculated using the Normalized Difference Snow Index (NDSI) from the MODIS Collection V006 (MODIS Snow Cover Daily L3 Global 500 m SIN GRID V006) [46]. SCA was derived from both MODIS daily snow cover products MOD10A1 and MYD10A1 generated respectively from the Terra and Aqua satellites [47]. SCA from Terra and Aqua were blended and clouds and other missing pixels were filled using a spatiotemporal filter implemented by [4]. More information is given by [44].
2.3 Methods
The methodological framework involved three main steps, which are summarized in Fig. 3: (i) cross-correlation analysis between streamflow and potential predictors, (ii) model calibration and cross-validation, and (iii) streamflow simulation and model comparison. Step (i) is crucial for detecting statistically significant associations and optimal time lags that impact streamflow and hence improve the performance of the applied models. This step involves analyzing the importance factor of each potential predictor by using cross-correlation analyses to identify lagged relationships between predictors and streamflow, and the correlation matrix to detect predictor collinearity. Step (ii) applied a ‘leave-one-year-out’ cross-validation for the three models. The last step (iii) compared the mean performance of the three models using common error metrics, defined below. The subsequent sections present the details of each methodological step.
2.3.1 Support Vector Regression (SVR)
The Support Vector Machine (SVM), which was created by [32], is one of the effective techniques considered to solve classification (Support Vector Classification - SVC) and regression (Support Vector Regression - SVR) issues [48,49,53,54,55]. SVM consists of projecting inputs into a high-dimensional space based on a linear function and calibration using an ideal hyperplane to partition datasets by offset. The goal of SVR is to find a regression function f(x) that can adequately characterize the connection between the given input dataset \(x = \bigl \{x1, x2, x3,\ldots xn\bigl \}\) and the target value \(y = \bigl \{y1, y2, y3,\ldots yn\bigl \}\) as follow:
where w denotes the function weight vector, b the offset factor and \(\phi \) is the nonlinear map**. To avoid overfitting the calibration data samples, SVR employs an objective function (Eq. 2) and a loss function (Eq. 3) to obtain the regression parameters in Eq. (1):
where \(\epsilon \) is a positive error threshold. C is a penalty coefficient defined by the user. \(\xi _i\) and \(\xi _i^*\) are the positive slack variables employed to determine the calibration data deviation from \(\epsilon \). Prior to model calibration, the hyperparameters, i.e., the type of kernels (either linear, polynomial, sigmoid, Gaussian or Gaussian Kernel Radial Basis Function (RBF)) and their corresponding parameters were optimized using the GridSearchCV algorithm from the Sklearn library GridSearchCV in Python demonstrated in [56]. The SVR model was implemented with the linear kernel as selected by the GridSearchCV.
2.3.2 Random Forest (RF)
The RF model, proposed by [58]. The number of trees (NTree) and the number of predictors chosen at each tree split (NPred) are the two most sensitive hyper-parameters of the RF algorithm [59]. Since the RF algorithm does not overfit data, the number of trees does not have a significant impact on the model performance [60] and a minimum of 500 trees is typically adequate (1300 trees were used in this study). But reducing the number of predictors at each split impacts the computation time, the correlations between trees and their predictive ability [61]. In this study, the RF model was implemented using the Python library Sklearn RandomForestRegressor. The hyperparameters (NTree and NPred) optimization was done using the same grid searching process as for SVR, and the hyperparameters values that gave the optimal score (mean-squared error) were considered.
2.3.3 Multiple Linear Regression (MLR)
Multiple Linear Regression (MLR) is an extension of the ordinary least squares (OLS) regression technique. MLR establishes the linear connection between explanatory inputs and a dependent variable. Each value of the independent variable x is associated with a value of the dependent variable y. Given n predictors, the equation of MLR can be described as follows:
where y is the target; \(x_i\) is a predictor; \(\alpha _0\) is the constant term; \(\alpha _i\) is the predictor’s slope coefficient; and \(\epsilon \) is the residual error term. The parameters tuning is done by the least-square approach during the calibration and consists of obtaining the best agreement between the target and the predictors [62]. Contrary to SVR and RF, the simple MLR procedure does not require optimizing hyperparameters.
2.4 Evaluation criteria
To assess the performance of the models, the Nash Sutcliffe efficiency (NSE, Eq. 5) and the Root Mean Squared Error (RMSE, Eq. 6) were used:
where \(y_i\) is the observed streamflow, \(\hat{y}_i\) is the predicted streamflow, \(\bar{y}\) is the mean observed streamflow and n is the number of observations.
2.5 Model construction and validation
2.5.1 Predictor selection
The process of predictors selection consists of decreasing the number of input variables for a predictive model. It allows reducing the number of inputs to those that are the most useful to a model and removing the non-informative or redundant predictors for predicting the target variable. In general, the selection of appropriate predictors has a positive effect on the performance of ML models [35, 63]. First, cross-correlation analysis based on Spearman’s rank coefficient was used to detect the the linear and non-linear correlation between the streamflow and the predictors in lag time up to 15 days. Then, the correlation matrix was used to highlight predictor collinearity. Uncorrelated predictors that had a significant correlation (fixed at 0.19 given the general observed low performances) with the target (streamflow) were retained as potential inputs for the ML models.
2.5.2 Model calibration
Model calibration and model validation are two important steps in the development and evaluation of a machine learning model. They are critical for ensuring that the model is reliable and accurate. Model calibration focuses on adjusting the model’s predictions to improve its accuracy and reliability. This step is often based on experimental data during which, the model is fitted by both predictors and the target. While model validation is the process of assessing the accuracy of the model by comparing its predictions to observed data that were not used in the calibration process. This step tests the generalization ability of the model to new and unseen data. The cross-validation method was first introduced by [64,65,66]. It consists of using different portions of the data to train and test the model following different iterations. There are several types of cross-validation, but the most common is k-fold cross-validation. In k-fold cross-validation, the dataset is randomly divided into k groups (or folds) of approximately equal size. The model is trained on all but one of the subsets (k-1), and then evaluated on the remaining subset. This process is repeated k times, with a different subset reserved for evaluation (and excluded from training) each time. Cross-validation helps to detect overfitting and provides an insight into how the model will generalize to new data. In the current study, the calibration and validation of models is realized throughout a ‘leave-one-year-out’ method which is a variant of k-fold cross-validation. It consists of removing one hydrological year at a time from calibration (training), train the model on the remaining years, and validate (test) on the left-out year. The process is repeated 13 times (13 folds CV) according to the period of 13 hydrological years going from September 2003 to August 2019.
3 Results
3.1 Predictor selection
The cross-correlation analysis reveals the relationship between the current day’s streamflow (day t) and its predictors over multiple lag times, up to 15 days before the streamflow event (Fig. 4). Daily streamflow can be explained by the antecedent streamflow and any additional runoff during a given day [67]. Consequently, antecedent streamflow, which occurs in days preceding the current day, was considered as a useful potential predictor in this study in addition to climatic variables. The maximum correlation between streamflow and its antecedent values occurs one day ahead (lag 1) and decreases gradually thereafter. For the other hydroclimate conditions, streamflow is strongly and positively correlated to precipitation at lag time 0, followed by SWE and SCA one day ahead (lag 1). Then come the weakest correlations with Melt at lag 2 (r = 0.21), humidity (RH, r = 0.20) at lag 0, and temperature (Ta, r = \(-\) 0.19) at lag 0, with the magnitude of all correlations decreasing over time thereafter.
In parallel, the correlation matrix (Fig. 5) reveals potential collinearity among predictors. Besides the hydroclimate variables, the streamflow at the three precedent days were considered as potential predictors based on their highest correlation with streamflow at day t as shown in the cross-correlation (Figs. 4 and 5). \(SWE_{t-1}\) and \(SCA_{t-1}\) are highly and positively correlated together (r = 0.76), while Ta is negatively correlated with humidity (RH) (r = \(-\) 0.51), \(SWE_{t-1}\) (r = \(-\) 0.54) and \(SCA_{t-1}\) (r = \(-\) 0.64). \(Melt_{t-2}\) is moderately correlated with \(SCA_{t-1}\) (r = 0.36) and \(SWE_{t-1}\) (r = 0.46), while streamflow during the three previous days is strongly inter-correlated. Consequently, precipitation on the current day (\(P_{t}\)), streamflow on the precedent day (\(Q_{t-1}\)) and the one day ahead (\(SCA_{t-1}\)) were chosen as potential, uncorrelated predictors. \(SCA_{t-1}\) was chosen instead of the one day ahead SWE, even though SWE at day t-1 had a higher correlation. This is because the difference in correlation between the two predictors was small and satellite derived SCA has a greater operational potential than simulated SWE. Also, SWE is simulated with a temperature index approach using station-based precipitation measurements [45], which are sparse and of variable quality in the catchment [68], which probably underestimates the spatial variability of SWE. SCA derived from MODIS, on the other hand, represent a direct observation of snow cover conditions, even if the 500 m resolution does not fully capture the spatial heterogeneity of the snow cover [44].
3.2 Model mean performance and inter-comparison
During calibration (training), all models exhibited good performance, with NSE values above 0.65 and RMSE values under 1.40 \(m^3s^{-1}\) (Fig. 6). In the validation phase, although NSE values decreased, they remained above 0.50, while the RMSE decreased for SVR and MLR but increased for the RF model, with an \(NSE = 0.59, 0.53, 0.54\) and \(RMSE = 1.18, 1.18, 1.01\) \(m^3s^{-1}\) for SVR, RF and MLR respectively. Overall, the satisfactory NSE range (\(0.53 \le NSE \le 0.58\)) [69] during validation underscores the robustness of the models, particularly the SVR model, which outperformed others with an NSE of 0.59. For a more detailed comparison of models’ performances, Fig. 7 illustrates observed streamflow and simulated streamflow by the three models during the test year of 2011–2012.
3.3 Model stability
Despite the generally acceptable performance highlighted in Fig. 6, annual cross-validation tests revealed significant instability of models acrossdifferent annual tests as depicted in Fig. 8a. During the annual cross validation tests, the NSE ranged widely from as low as 0.05–0.16 to as high as 0.81–0.87. While the three models had their lowest performance during the 2012/2013 hydrological test year, the behavior of each model differed to some extent during other test years. Notably, RF and SVR covary more closely than with MLR. For the 2010/2011 test year, SVR and RF had a low performance (NSE = 0.29 and 0.38, respectively), while MLR exhibited its highest performance (NSE = 0.85). Moreover, MLR had a low performance (NSE = 0.26) in 2004/2005 when RF and SVR performed well (NSE = 0.69 and 0.71, respectively). Thus, the performance of the three models was somewhat heterogenous.
Model performance during annual cross-validation and the distribution of hydroclimatic conditions during the calibration and validation periods. a Nash–Sutcliffe efficiency (NSE) score for each left out year during cross-validation; b–f distribution of hydroclimatic conditions during the multiannual calibration and annual validation period. A logarithmic transformation was applied on all positive data with skewed distributions. b Log-transformed streamflow; c log-transformed precipitation; d log-transformed snow cover area, e relative humidity; f air temperature over the 2003–2016 period
4 Discussion
4.1 General model performance
Passing from calibration to validation, the models exhibited almost the same behavior, with the NSE decreasing and RMSE as well, except for the RF model for which the RMSE increased (Fig. 6). Overall, the SVR model showed the highest performance during the validation in terms of NSE (0.59), followed by MLR (0.54) and RF (0.53), while the MLR model outperformed the other models in terms of RMSE (RMSE = 1.01, 1.18, 1.18 \(m^3s^{-1}\) for MLR, SVR and RF respectively). Considering the challenges mentioned above, especially the quality of the ground data measurements and their spatial distribution, a mean performance of NSE = 0.55 of the three models is acceptable as a first step towards an integrated modelling that includes Artificial intelligence and hydrological models, remotely sensed data, and the physical processes. Previous studies have demonstrated the successful application of SVR and RF for streamflow prediction [36, 37, 67, 70] as they provide a good accuracy when dealing with streamflow abrupt fluctuations [71] and for predicting peak flow [72]. MLR as well has been widely applied for streamflow prediction issues [62, 73, 74]. However, the achieved accuracy of the three models (\(NSE < 0.60\)) in the present study was inferior compared to similar studies conducted either in wet or in arid regions [62, 75, 76]. In wet regions, numerous studies have demonstrated the effectiveness of SVR, RF and MLR in modelling streamflow. [77] applied SVR to simulate monthly streamflow in the Kurau River in Malaysia with an R2 equal to 0.71, demonstrating SVR’s capability in capturing complex hydrological patterns in tropical climates. [67] simulated daily streamflow of the North American River and the Chehalis River in the US using SVR and obtained NSE scores of 0.83 and 0.93, respectively. RF was also used for annual streamflow prediction in the source region the Yangtze River (SRYR) with a NSE of 0.82 [72]. [78] modelled the monthly mean streamflow over three different stations in Turkey (Durucasu, Sutluce and Kale) with a good performance expressed by a correlation (R) value greater than 0.80. Furthermore, MLR was tested in the upper reaches of the Yangtze River in China for a ten-day streamflow forecast, yielding a NSE greater than 0.80 [79]. In a different study, MLR was employed for short-term streamflow forecasting in the East River basin in China, yielding a R value greater than 0.90 [80]. In arid and semi-arid regions, SVR, RF and MLR models resulted in accurate streamflow modelling. [51] simulated the monthly streamflow of the semi-arid Wei River Basin in China using SVR and achieved a very high performance of NSE = 0.99. Another study conducted by [81] in the **sha River basin, Southwest China, modelled monthly streamflow with SVR with a NSE equal to 0.96. Besides, in the Sevier River Basin located in South-Central Utah, USA, [34] applied SVR to predict hourly streamflow and acquired an R2 equal to 0.97. On the other hand, [82] used a RF model for monthly streamflow forecasting in the Aswan High Dam (AHD) in Egypt which gave an R2 equal to 0.90. In the Karaj reservoir in Iran, [83] modelled daily streamflow with an NSE equal to 0.97. Furthermore, [73] used MLR model for daily river flow prediction over the Seybouse River Basin located in northeastern Algeria and obtained an R2 equal to 0.90. In another arid region, the Karoon River in Iran, MLR yielded an R2 equal to 0.74 when simulating daily river flow [84]. Consequently, the accuracy of the three daily prediction models obtained in the present study is less than the one usually achieved in other studies, including those in arid regions at a daily time scale.
4.2 Model stability and relation to hydroclimatic conditions
The annual cross-validation tests demonstrated notable variability in the performance of SVR, RF and MLR models, highlighting a fundamental challenge in hydrological forecasting. Not only were the performances heterogeneous from one annual split to the other for the same model, but it also differed from one model to another for the same split. Despite the known ability of SVR and RF to handle the hydrological forecasts [79, 85,86,87,88,89], they fail to perform well for all the splits in the present study. We suggest that the quality of streamflow records, often marred by missing or inaccurate data, coupled with the heterogeneous hydro-climatic conditions in the studied area, likely contributed to the observed performance instability.
Streamflow measurements in Rheraya are subject to uncertainties since they are based on a non-updated stage-discharge rating curve. Practically, streamflow measurements depend on the rating curve and the measured water depth. Due to large streamflow events that episodically occur in the semi-arid and mountainous Rheraya sub-basin (see extreme values in Fig. 2), the riverbed can be significantly reworked during these events and affect the rating curve. Therefore, rating curves must be updated regularly. However, doing so is costly and time-consuming for the Tensift hydraulic water agency (ABHT). This results in a non-updated rating curve and a loss of accuracy in streamflow measurements [9, 43, 90, 91]. This suggests that the data quality may have a potential impact on the stability of the model accuracies. Besides, streamflow in Rheraya is also characterized by strong temporal variability (Fig. 2). To illustrate this, we analyzed the differences in streamflow distribution between the calibration and validation periods for each annual split of the cross-validation (Fig. 8b). The streamflow was log-transformed to adress its highly skewed distribution, enabling a more accurate comparison between calibration and validation periods. Over the 13-year study period, the lowest performance was obtained for 2012/2013, for all models (Fig. 8a). For this validation year, the range of the streamflow distribution during the test is included in that of the calibration period. While for the hydrological year 2010/2011 which had a low performance for the SVR and RF models, the range of the streamflow distribution during the test exceeded that of the calibration period. For the validation years for which simulations by the three models were relatively satisfactory, mainly 2006/2007, 2009/2010 and 2011/2012, the range of the streamflow distribution during validation was either equal, or fell within, the range of streamflow distribution of the calibration period. It can be concluded that the model’s performance was closely tied to the range of streamflow distribution between the calibration and validation periods. We suppose that when the range of streamflow distribution during the validation year was equal to or included within the range of the calibration distribution, the models provided adequate accuracy (\(NSE \ge 0.50\)). Conversely, when the range of the streamflow distribution for the validation year was larger than that of the calibration set, the performance degraded. The variation in streamflow distribution over the different splits is mainly caused by the strong temporal variability of the hydrological processes in Rheraya sub-basin [9, 90,91,92]. To understand the underpinnings of streamflow variability, we examine the annual variations in the distribution of other hydroclimatic variables (P, SCA, Ta and RH) during both the calibration and validation periods (Fig. 8c–f). Precipitation and SCA were also log-transformed to better portray the highly skewed distributions which will aid in the analysis of their temporal variations. Although Ta and RH were not used as predictors in our models, analyzing their distributions can help understand the variability of other critical predictors like SCA and precipitation. A more restricted distribution of precipitation (P) during calibration than validation is frequently seen, which in some years matches the same phenomenon in streamflow and the aformentionned degradation in model performance (e.g., 2010/2011: RF and SVR; 2012/2013: all models). The same point is noticed for the other variables (SCA, RH and Ta), the distribution between calibration and validation are more balanced. [8, 14, 93, 94] highlighted the significant seasonal and interannual variability in the hydrological processes which explains the frequent unbalance between the calibration and validation distribution seen in Fig. 8. In this context, the effect of the North Atlantic Oscillation (NAO) on regional hydrological patterns, particularly temperature and precipitation, was the subject of various studies. López-Moreno et al. (2011) conducted a study over the mediterranean mountains including the Atlas Mountains, to analyze the effect of the NAO on the temperature and precipitation. It was concluded that the NAO strongly impacts the snowpack dynamics. This conclusion was confirmed by another research conducted by [95] in the Moroccan Atlas Mountains. The study as well highlighted a possible impact of the NAO on the snowpack dynamic through rainfall and temperature. Furthermore, the physiography of this basin is characterized by a rugged topography and steep slopes with sparse vegetation that accentuate the spatial heterogeneity of runoff processes. On the other hand, the scattered rains and the large infiltration and evapotranspiration rates in the large foothill and plain segment of the catchment favor the spatial and temporal variability of rainfall and discharge at the basin outlet. This explains the high temporal variability of streamflow in the Rheraya sub-basin. Given this high variability, the models tested demonstrate difficulties in extrapolating beyond their calibration conditions. This suggests a need for longer-time series covering a larger spectrum of streamflow and precipitation variations to strengthen the prediction accuracy of models. Still, the three tested models can overall be considered reliable for daily streamflow prediction based on their mean performance during validation. However, their low performance for certain years reduces their usefulness and points to the need of obtaining longer, quality-controlled streamflow time series to better sample the heterogeneous hydrological variability during model calibration. In parallel, considering further fine tuning of model hyperparameters could also help to improve the forecasting capacity of SVR and RF models. A further challenge is model stability under climate change conditions. Temperatures are projected to increase and precipitation to decrease in Morocco like other Mediterranean countries [96,97,98], which have a direct impact on river discharge [99]. Using longer records to develop ML models for daily streamflow prediction would thus require updating the model calibration frequently to adapt to the warming and drying climate in the area.
4.3 Practical implications in water resources engineering
The application of SVR, RF, and MLR models improves the accuracy of streamflow prediction to a reasonable extent, thereby help stakeholders to advance water management strategies, which include efficient irrigation scheduling, drought management, and flood risk reduction [24, 100]. Accurate predictions play a crucial role in adjusting to climate change, guiding infrastructure development, and sha** policy decisions related to the sustainable allocation and utilization of water resources. By providing a foundation for more informed choices regarding infrastructure investments, such as the building of dams and the establishment of early warning systems, the study contributes to addressing the difficulties brought about by water scarcity and competition. Ultimately, the findings derived from this research support the maintenance of water sustainability and ecosystem conservation in arid regions, supporting efforts to enhance water security, optimize resource utilization, and promote the sustainable growth of at-risk communities and ecosystems.
5 Conclusions
In this study, daily streamflow modelling was conducted in the Rheraya sub-basin of the Moroccan High Atlas range, using observed daily precipitation, antecedent streamflow, and remotely sensed snow cover data as inputs into two machine learning models, SVR and RF as well as MLR. A leave-one-year-out cross-validation approach was used for model validation for the period 2003–2016. Next, the mean performance of SVR, RF and MLR was compared in calibration and validation. Finally, the influence of the temporal variability of hydroclimatic conditions on the quality of the streamflow simulations was analyzed, and the following conclusions were obtained: (1) RF, MLR and especially SVR models showed an overall adequate mean performance during validation (NSE: 0.53, 0.54 and 0.59 respectively), making them promising tools for daily streamflow prediction in Rheraya. Still this performance was lower than previously reported daily streamflow modelling studies in both wet and dry regions. (2) The hydrological processes in the Tensift region are highly variable in time, and the resulting heterogeneous streamflow conditions during the 13-year observational period greatly affected model stability and transferability between the calibration periods and individual validation years. The cross-validation procedure demonstrated the instable predictive performance of models over the studied period, changing the period of calibration or validation can lead to different accuracy of simulations. By next, models can be hardly transferable over years. (3) The instability of the model performance is mainly ascribed to the natural temporal variability of hydroclimatic conditions but can be also related to the uncertainty in the measured streamflow; more quality-controlled observations are needed to reduce this source of uncertainty. (4) In a semi-arid region characterized by a high temporal variability of hydroclimatic conditions, a direct application of ML techniques for surface water modelling may not have a good accuracy due to the non-similarity of streamflow over time. It is suggested that under these conditions the calibration period should not be chosen randomly, and that the calibration period should be long enough to adequately capture the natural temporal variability of streamflow. If these challenges can be overcome, ML-driven streamflow models would be a useful tool as they do not require a large amount of hydro-meteorological forcing data, nor do they require information on catchment properties, unlike some hydrological models. Hence, ML models could be a helpful tool for runoff monitoring in semi-arid catchments affected by data scarcity. Nevertheless, a rigorous validation of the reliability of this type of model is needed in the other sub-basins of Tensift to evaluate its transferability, given the spatial variability in climate and resulting hydrological processes. Considering semi-arid regions characterized by data-scarcity, the transferability of ML methods for hydrological modelling from a gauged basin to an ungauged one is of big interest. The aim is the possibility to reproduce the hydrological response of the ungauged sub-basin without requiring observations in the same sub-basin. It would be helpful to assess the possibility to train models in a gauged basin then reproduce the streamflow of an ungauged one. In Tensift, most of the sub-basins are ungauged and even the existing measurements are subject to errors. While this approach can help generate the hydrological response of these sub-basins, it requires a rigorous evaluation beforehand. In conclusion, this research presents an initial investigation of ML’s potential in semi-arid hydrology, paving the way for future enhancements like hybrid models and ensemble techniques, which will significantly contribute to efficient water resource management in semi-arid regions.
References
Viviroli D, Dürr HH, Messerli B, Meybeck M, Weingartner R. Mountains of the world, water towers for humanity: typology, map**, and global significance. Water Resour Res. 2007. https://doi.org/10.1029/2006WR005653.
Dettinger M. Impacts in the third dimension. Nat Geosci. 2014;7:166–7. https://doi.org/10.1038/ngeo2096.
Boudhar A, Hanich L, Boulet G, Berjamy B, Chehbouni A. Evaluation of the snowmelt runoff model in the Moroccan high atlas mountains using two snow-cover estimates evaluation of the snowmelt runoff model in the Moroccan high atlas mountains using two snow-cover estimates. Hydrol Sci J. 2009. https://doi.org/10.1623/hysj.54.6.1094.
Marchane A, Jarlan L, Hanich L, Boudhar A, Gascoin S, Tavernier A, Filali N, Page ML, Hagolle O, Berjamy B. Assessment of daily MODIS snow cover products to monitor snow cover dynamics over the Moroccan atlas mountain range. Remote Sens Environ. 2015;160:72–86. https://doi.org/10.1016/j.rse.2015.01.002.
Fniguire F, Laftouhi NE, Saidi ME, Zamrane Z, Himer HE, Khalil N. Spatial and temporal analysis of the drought vulnerability and risks over eight decades in a semi-arid region (Tensift Basin: Morocco). Theor Appl Climatol. 2017;130:321–30. https://doi.org/10.1007/s00704-016-1873-z.
Hadri A, El M, Saidi M, Boudhar A. Multiscale drought monitoring and comparison using remote sensing in a Mediterranean arid region: a case study from west-central Morocco. 2021. https://doi.org/10.1007/s12517-021-06493-w/Published
Driouech F, ElRhaz K, Moufouma-Okia W, Arjdal K, Balhane S. Assessing future changes of climate extreme events in the CORDEX-MENA region using regional climate model ALADIN-climate. Earth Syst Environ. 2020;4:477–92. https://doi.org/10.1007/s41748-020-00169-3.
Marchane A, Tramblay Y, Hanich L, Ruelland D, Jarlan L. Climate change impacts on surface water resources in the Rheraya catchment (high atlas, morocco). Hydrol Sci J. 2017;62:979–95. https://doi.org/10.1080/02626667.2017.1283042.
Chaponnière A, Boulet G, Chehbouni A, Aresmouk M. Understanding hydrological processes with scarce data in a mountain environment. Hydrol Process. 2008;22:1908–21. https://doi.org/10.1002/hyp.6775.
Shalamu A. Monthly and seasonal streamflow forecasting in the Rio Grande Basin; 2009;
Sadio CAAS, Faye C. Evaluation of extreme flow characteristics in the Casamance watershed upstream of Kolda using the IHA/RVA method. Int J Sustain Energy Environ Res. 2023;12:31–45. https://doi.org/10.18488/13.v12i2.3584.
Boudhar A, Hanich L, Boulet G, Duchemin B, Chehbouni A. Apport des données spot-vegetation à la modélisation de la fonte de neige dans le haut atlas marocain 2010; https://doi.org/10.13140/2.1.4847.4569
Hajhouji Y, Simonneaux V, Gascoin S, Fakir Y, Richard B, Chehbouni A, Boudhar A. Rainfall-runoff modeling and hydrological regime analysis of a semi-arid snow-influenced catchment case of the Rheraya river (high atlas, Morocco). Houille Blanche. 2018;6368:49–62. https://doi.org/10.1051/lhb/2018032.
Zamrane Z, Turki I, Laignel B, Mahé G, Laftouhi NE. Characterization of the interannual variability of precipitation and streamflow in Tensift and Ksob basins (Morocco) and links with the NAO. Atmosphere. 2016. https://doi.org/10.3390/atmos7060084.
Boudhar A, Ouatiki H, Bouamri H, Lebrini Y, Karaoui I, Hssaisoune M, Arioua A, Benabdelouahab T. Hydrological response to snow cover changes using remote sensing over the Oum Er Rbia upstream basin, Morocco, 2020; pp. 95–102. Springer. https://doi.org/10.1007/978-3-030-21166-0_9
Boudhar A, Baba MW, Marchane A, Ouatiki H, Bouamri H, Hanich L, Chehbouni A. Remote sensing of African mountains: geospatial tools toward sustainability. 2022; 1–247. https://doi.org/10.1007/978-3-031-04855-5
Zhou Y, Cui Z, Lin K, Sheng S, Chen H, Guo S, Xu CY. Short-term flood probability density forecasting using a conceptual hydrological model with machine learning techniques. J Hydrol. 2022;604: 127255. https://doi.org/10.1016/j.jhydrol.2021.127255.
Mohana H. Intelligent system design—speech generation. 2011.
Zemzami M, Benaabidate L. Improvement of artificial neural networks to predict daily streamflow in a semi-arid area. Hydrol Sci J. 2016;61:1801–12. https://doi.org/10.1080/02626667.2015.1055271.
Tayyab M, Zhou J, Adnan R, Zeng X. Application of artificial intelligence method coupled with discrete wavelet transform method, 2017;vol. 107, pp. 212–217. Elsevier. https://doi.org/10.1016/j.procs.2017.03.081.
Muhammad R, Yuan X, Kisi O, Yuan Y. Streamflow forecasting using artificial neural network and support vector machine models. Am Sci Res J Eng Technol Sci. 2017;29:286–94.
Niu W, Feng Z. Evaluating the performances of several artificial intelligence methods in forecasting daily streamflow time series for sustainable water resources management. Sustain Cities Soc. 2021;64: 102562. https://doi.org/10.1016/j.scs.2020.102562.
Chakravarthy VVSSS, Flores-Fuentes W, Bhateja V, Biswal BN. Advances in micro-electronics. Embedded Syst IoT. 2022;1:443. https://doi.org/10.1007/978-981-16-8550-7.
Nifa K, Boudhar A, Ouatiki H, Elyoussfi H, Bargam B, Chehbouni A. Deep learning approach with LSTM for daily streamflow prediction in a semi-arid area: a case study of Oum Er-Rbia river basin, Morocco. Water. 2023. https://doi.org/10.3390/w15020262.
Yaseen ZM, El-shafie A, Jaafar O, Afan HA, Sayl KN. Artificial intelligence based models for stream-flow forecasting: 2000–2015. J Hydrol Reg. 2015;530:829–44. https://doi.org/10.1016/j.jhydrol.2015.10.038.
Chiang YM, Chang LC, Chang FJ. Comparison of static-feedforward and dynamic-feedback neural networks for rainfall-runoff modeling. J Hydrol. 2004;290:297–311. https://doi.org/10.1016/j.jhydrol.2003.12.033.
Hsu KL, Gupta HV, Gao X, Sorooshian S, Imam B. Self-organizing linear output map (solo): an artificial neural network suitable for hydrologic modeling and analysis. Water Resour Res. 2002;38:38–13817. https://doi.org/10.1029/2001wr000795.
Moradkhani H, Hsu KL, Gupta HV, Sorooshian S. Improved streamflow forecasting using self-organizing radial basis function artificial neural networks. J Hydrol. 2004;295:246–62. https://doi.org/10.1016/j.jhydrol.2004.03.027.
Mutlu E, Chaubey I, Hexmoor H, Bajwa SG. Comparison of artificial neural network models for hydrologic predictions at multiple gauging stations in an agricultural watershed. Hydrol Process. 2008;22:5097–106. https://doi.org/10.1002/hyp.7136.
Kagoda PA, Ndiritu J, Ntuli C, Mwaka B. Application of radial basis function neural networks to short-term streamflow forecasting. Phys Chem Earth. 2010;35:571–81. https://doi.org/10.1016/j.pce.2010.07.021.
Mehr AD, Kahya E, Sahin A, Nazemosadat MJ. Successive-station monthly streamflow prediction using different artificial neural network algorithms. Int J Environ Sci Technol. 2015;12:2191–200. https://doi.org/10.1007/s13762-014-0613-0.
Vapnik V. Support-vector networks. IEEE Expert-Intell Syst Their Appl. 1995;7:63–72. https://doi.org/10.1109/64.163674.
Dibike YB, Velickov S, Solomatine D, Abbott MB. Model induction with support vector machines: introduction and applications. J Comput Civ Eng. 2001;15:208–16. https://doi.org/10.1061/(asce)0887-3801(2001)15:3(208).
Asefa T, Kemblowski M, McKee M, Khalil A. Multi-time scale stream flow predictions: the support vector machines approach. J Hydrol. 2006;318:7–16. https://doi.org/10.1016/j.jhydrol.2005.06.001.
Noori R, Karbassi AR, Moghaddamnia A, Han D, Zokaei-Ashtiani MH, Farokhnia A, Gousheh MG. Assessment of input variables determination on the SVM model performance using PCA, gamma test, and forward selection techniques for monthly stream flow prediction. J Hydrol. 2011;401:177–89. https://doi.org/10.1016/j.jhydrol.2011.02.021.
Sudheer C, Anand N, Panigrahi BK, Mathur S. Streamflow forecasting by SVM with quantum behaved particle swarm optimization. Neurocomputing. 2013;101:18–23. https://doi.org/10.1016/j.neucom.2012.07.017.
He Z, Wen X, Liu H, Du J. A comparative study of artificial neural network, adaptive neuro fuzzy inference system and support vector machine for forecasting river flow in the semiarid mountain region. J Hydrol. 2014;509:379–86. https://doi.org/10.1016/j.jhydrol.2013.11.054.
Nourani V, Baghanam AH, Adamowski J, Kisi O. Applications of hybrid wavelet-Artificial Intelligence models in hydrology: a review. Elsevier. 2014. https://doi.org/10.1016/j.jhydrol.2014.03.057.
Oborie E, Rowland ED. Flood influence using GIS and remote sensing based morphometric parameters: a case study in Niger delta region. J Asian Sci Res. 2023;13:1–15. https://doi.org/10.55493/5003.v13i1.4719.
Abdullah D, Gartsiyanova K, Qizi KEMM, Javlievich EA, Bulturbayevich MB, Zokirova G, Nordin MN. An artificial neural networks approach and hybrid method with wavelet transform to investigate the quality of Tallo River, Indonesia. Casp J Environ Sci. 2023;21:647–56.https://doi.org/10.22124/CJES.2023.6942
Mehr AD, Gandomi AH. MSGP-LASSO: an improved multi-stage genetic programming model for streamflow prediction. Inf Sci. 2021;561:181–95. https://doi.org/10.1016/j.ins.2021.02.011.
Bell BA, Hughes PD, Fletcher WJ, Cornelissen HL, Rhoujjati A, Hanich L, Braithwaite RJ. Climate of the Marrakech high atlas, Morocco: temperature lapse rates and precipitation gradient from piedmont to summits. Arct Antarct Alp Res. 2022;54:78–95. https://doi.org/10.1080/15230430.2022.2046897.
Jarlan L, Khabba S, Er-Raki S, Page ML, Hanich L, Fakir Y, Merlin O, Mangiarotti S, Gascoin S, Ezzahar J, Kharrou MH, Berjamy B, Saaïdi A, Boudhar A, Benkaddour A, Laftouhi N, Abaoui J, Tavernier A, Boulet G, Simonneaux V, Driouech F, Adnani ME, Fazziki AE, Amenzou N, Raibi F, Mandour AE, Ibouh H, Dantec VL, Habets F, Tramblay Y, Mougenot B, Leblanc M, Faïz ME, Drapeau L, Coudert B, Hagolle O, Filali N, Belaqziz S, Marchane A, Szczypta C, Toumi J, Diarra A, Aouade G, Hajhouji Y, Nassah H, Bigeard G, Chirouze J, Boukhari K, Abourida A, Richard B, Fanise P, Kasbani M, Chakir A, Zribi M, Marah H, Naimi A, Mokssit A, Kerr Y, Escadafal R. Remote sensing of water resources in semi-arid Mediterranean areas: the joint international laboratory trema. Int J Remote Sens. 2015;36:4879–917. https://doi.org/10.1080/01431161.2015.1093198.
Bouamri H, Kinnard C, Boudhar A, Gascoin S, Hanich L, Chehbouni A. Modis does not capture the spatial heterogeneity of snow cover induced by solar radiation. Front Earth Sci. 2021;9:1–19. https://doi.org/10.3389/feart.2021.640250.
Bouamri H, Boudhar A, Gascoin S, Kinnard C. Performance of temperature and radiation index models for point-scale snow water equivalent (SWE) simulations in the Moroccan high atlas mountains. Hydrol Sci J. 2018;63:1844–62. https://doi.org/10.1080/02626667.2018.1520391.
Riggs GA, Hall DK, Román MO. Overview of NASA’s MODIS and visible infrared imaging radiometer suite (VIIRS) snow-cover earth system data records. Earth Syst Sci Data. 2017;9:765–77. https://doi.org/10.5194/essd-9-765-2017.
Hall DK, Riggs GA, Román MO. Viirs snow cover algorithm theoretical basis document (ATBD); 2016;
Chen ST, Yu PS, Tang YH. Statistical downscaling of daily precipitation using support vector machines and multivariate analysis. J Hydrol. 2010;385:13–22. https://doi.org/10.1016/j.jhydrol.2010.01.021.
Zhao G, Pang B, Xu Z, Xu L. A hybrid machine learning framework for real-time water level prediction in high sediment load reaches. J Hydrol. 2020;581: 124422. https://doi.org/10.1016/j.jhydrol.2019.124422.
Liu D, Zhang Y, Zhang J, **ong L, Liu P, Chen H, Yin J. Rainfall estimation using measurement report data from time-division long term evolution networks. J Hydrol. 2021;600: 126530. https://doi.org/10.1016/j.jhydrol.2021.126530.
Meng E, Huang S, Huang Q, Fang W, Wu L, Wang L. A robust method for non-stationary streamflow prediction based on improved EMD-SVM model. J Hydrol. 2019;568:462–78. https://doi.org/10.1016/j.jhydrol.2018.11.015.
Yu X, Wang Y, Wu L, Chen G, Wang L, Qin H. Comparison of support vector regression and extreme gradient boosting for decomposition-based data-driven 10-day streamflow forecasting. J Hydrol. 2020;582: 124293. https://doi.org/10.1016/j.jhydrol.2019.124293.
Nachappa TG, Piralilou ST, Gholamnia K, Ghorbanzadeh O, Rahmati O, Blaschke T. Flood susceptibility map** with machine learning, multi-criteria decision analysis and ensemble using dempster Shafer theory. J Hydrol. 2020;590: 125275. https://doi.org/10.1016/j.jhydrol.2020.125275.
Ghorbanpour AK, Hessels T, Moghim S, Afshar A. Comparison and assessment of spatial downscaling methods for enhancing the accuracy of satellite-based precipitation over lake Urmia basin. J Hydrol. 2021;596: 126055. https://doi.org/10.1016/j.jhydrol.2021.126055.
Kumar A, Ramsankaran RAAJ, Brocca L, Muñoz-Arriola F. A simple machine learning approach to model real-time streamflow using satellite inputs: demonstration in a data scarce catchment. J Hydrol. 2021. https://doi.org/10.1016/j.jhydrol.2021.126046.
Liu X, Liu TQ, Feng P. Long-term performance prediction framework based on XGBoost decision tree for pultruded FRP composites exposed to water, humidity and alkaline solution. Compos Struct. 2022;284: 115184. https://doi.org/10.1016/j.compstruct.2022.115184.
Breiman Shang J, Zhu Q, Ling C, **. Water Resour Manag. 2017;31:2761–75. https://doi.org/10.1007/s11269-017-1660-3.
Huang BFF, Boutros PC. The parameter sensitivity of random forests. BMC Bioinform. 2016;17:1–13. https://doi.org/10.1186/s12859-016-1228-x.
Guan H, Li J, Chapman M, Deng F, Ji Z, Yang X. Integration of orthoimagery and lidar data for object-based urban thematic map** using random forests. Int J Remote Sens. 2013;34:5166–86. https://doi.org/10.1080/01431161.2013.788261.
Sheykhmousa M, Mahdianpari M, Ghanbari H, Mohammadimanesh F, Ghamisi P, Homayouni S. Support vector machine versus random forest for remote sensing image classification: a meta-analysis and systematic review. IEEE J Sel Top Appl Earth Observ Remote Sens. 2020;13:6308–25. https://doi.org/10.1109/JSTARS.2020.3026724.
Sahour H, Gholami V, Vazifedan M. A comparative analysis of statistical and machine learning techniques for map** the spatial distribution of groundwater salinity in a coastal aquifer. J Hydrol. 2020;591: 125321. https://doi.org/10.1016/j.jhydrol.2020.125321.
Chandrashekar G, Sahin F. A survey on feature selection methods. Comput Electr Eng. 2014;40:16–28. https://doi.org/10.1016/j.compeleceng.2013.11.024.
Allen DM. The relationship between variable selection and data augmentation and a method for prediction. 1974;0–3.
Stone M. Cross-validatory choice and assessment of statistical predictions. J R Stat Soc B. 1974;36:111–33. https://doi.org/10.1111/j.2517-6161.1974.tb00994.x.
Geisser S. The predictive sample reuse method with applications. J Am Stat Assoc. 1975;70:320–8. https://doi.org/10.1080/01621459.1975.10479865.
Tongal H, Booij MJ. Simulation and forecasting of streamflows using machine learning models coupled with base flow separation. J Hydrol. 2018;564:266–82. https://doi.org/10.1016/j.jhydrol.2018.07.004.
Hanich L, Chehbouni A, Gascoin S, Boudhar A, Jarlan L, Tramblay Y, Boulet G, Marchane A, Baba MW, Kinnard C, Simonneaux V, Fakir Y, Bouchaou L, Leblanc M, Page ML, Bouamri H, Er-Raki S, Khabba S. Snow hydrology in the Moroccan atlas mountains. J Hydrol Reg Stud. 2022;42: 101101. https://doi.org/10.1016/j.ejrh.2022.101101.
Moriasi DN, Arnold JG, Liew MWV, Bingner RL, Harmel RD, Veith TL. Megsqaws. 2007;50:885–900.
Buyukyildiz M. Monthly streamflow time series modelling of Coruh River 2014. www.fce.vutbr.cz/ekr/PBE.
Papacharalampous GA, Tyralis H. Evaluation of random forests and prophet for daily streamflow forecasting. Adv Geosci. 2018;45:201–8. https://doi.org/10.5194/adgeo-45-201-2018.
Li J, Wang Z, Lai C, Zhang Z. Tree-ring-width based streamflow reconstruction based on the random forest algorithm for the source region of the Yangtze River, China. CATENA. 2019;183: 104216. https://doi.org/10.1016/j.catena.2019.104216.
Aichouri I, Hani A, Bougherira N, Djabri L, Chaffai H, Lallahem S. River flow model using artificial neural networks. 2015;vol. 74, pp. 1007–14. Elsevier. https://doi.org/10.1016/j.egypro.2015.07.832.
Yaseen ZM, Jaafar O, Deo RC, Kisi O, Adamowski J, Quilty J, El-Shafie A. Stream-flow forecasting using extreme learning machines: a case study in a semi-arid region in Iraq. J Hydrol. 2016;542:603–14. https://doi.org/10.1016/j.jhydrol.2016.09.035.
Sahoo A, Ghose DK. Imputation of missing precipitation data using KNN, SOM, RF, and FNN. Soft Comput. 2022;26:5919–36. https://doi.org/10.1007/s00500-022-07029-4.
Sahoo A, Samantaray S, Ghose DK. Multilayer perceptron and support vector machine trained with grey wolf optimiser for predicting floods in Barak River, India. J Earth Syst Sci. 2022. https://doi.org/10.1007/s12040-022-01815-2.
Adib MNM, Harun S. Machine learning algorithms with hydro-meteorological data for monthly streamflow forecasting of Kurau River, Malaysia. In: Proceedings of the 5th international conference on water resources (ICWR) 2, 2021;29–41. https://doi.org/10.1007/978-981-99-3577-2.
Mehraein M, Mohanavelu A, Naganna SR, Kulls C, Kisi O. Monthly streamflow prediction by metaheuristic regression approaches considering satellite precipitation data. Water. 2022. https://doi.org/10.3390/w14223636.
Fang W, Zhou J, Jia BJ, Gu L, Xu Z. Study on the evolution law of performance of mid- to long-term streamflow forecasting based on data-driven models. Sustain Cities Soc. 2023;88: 104277. https://doi.org/10.1016/j.scs.2022.104277.
Zhenghao Z, Zhang Q, Singh VP. Univariate streamflow forecasting using commonly used data-driven models: literature review and case study. Hydrol Sci J. 2018;63:1091–111. https://doi.org/10.1080/02626667.2018.1469756.
Sun N, Zhang S, Peng T, Zhang N, Zhou J, Zhang H. Multi-variables-driven model based on random forest and gaussian process regression for monthly streamflow forecasting. Water. 2022;14:1828. https://doi.org/10.3390/w14111828.
Tofiq YM, Latif SD, Ahmed AN, Kumar P, El-Shafie A. Optimized model inputs selections for enhancing river streamflow forecasting accuracy using different artificial intelligence techniques. Water Resour Manag. 2022;36:5999–6016. https://doi.org/10.1007/s11269-022-03339-2.
Rezaie-Balf M, Nowbandegani SF, Samadi SZ, Fallah H, Alaghmand S. An ensemble decomposition-based artificial intelligence approach for daily streamflow prediction. Water. 2019. https://doi.org/10.3390/w11040709.
Azimi M, Fatemah G, Massoud T, Abrishamchi A. World environmental and water resources congress 2011: Bearing knowledge for sustainability—proceedings of the 2011 world environmental and water resources congress. 2011;1184–93.
Guo J, Zhou J, Qin H, Zou Q, Li Q. Monthly streamflow forecasting based on improved support vector machine model. Expert Syst Appl. 2011;38:13073–81. https://doi.org/10.1016/j.eswa.2011.04.114.
Suhartono Shabri A. Prévision de débit à l’aide de machines à vecteurs de support en moindres carrés. Hydrol Sci J. 2012;57:1275–93. https://doi.org/10.1080/02626667.2012.714468.
Huang S, Chang J, Huang Q, Chen Y. Monthly streamflow prediction using modified EMD-based support vector machine. J Hydrol. 2014;511:764–75. https://doi.org/10.1016/j.jhydrol.2014.01.062.
Granata F, Nunno FD, Marinis G. Stacked machine learning algorithms and bidirectional long short-term memory networks for multi-step ahead streamflow forecasting: A comparative study. J Hydrol. 2022;613: 128431. https://doi.org/10.1016/j.jhydrol.2022.128431.
Akbarian M, Saghafian B, Golian S. Monthly streamflow forecasting by machine learning methods using dynamic weather prediction model outputs over Iran. J Hydrol. 2023;620: 129480. https://doi.org/10.1016/j.jhydrol.2023.129480.
Simonneaux V, Hanich L, Boulet G, Thomas S. Modelling runoff in the rheraya catchment (high atlas, morocco) using the simple daily model gr4j. trends over the last decades. 13th IWRA World Water Congress, Montpellier, France 2008.
Boudhar A. Télédétection du manteau neigeux et modélisation de la contribution des eaux de fonte des neiges aux débits des oueds du haut atlas de marrakech. 2009;215.
Bennani O, Brahim YA, Saidi MEM, Fniguire F, Author C. Variability of surface water resources and extreme flows under climate change conditions in arid and Mediterranean area: case of Tensift watershed, Morocco. 2016;9:165–74.
Riad S, Mania J, Bouchaou L, Najjar Y. Rainfall-runoff model using an artificial neural network approach. Math Comput Model. 2004;40:839–46. https://doi.org/10.1016/j.mcm.2004.10.012.
Khomsi K, Mahe GIL, Sinan M, Snoussi M. Hydro-climatic variability in two Moroccan basins: comparative analysis of temperature, rainfall and runoff regimes. 2013;2013:183–90.
Marchane A, Jarlan L, Boudhar A, Tramblay Y, Hanich L. Linkages between snow cover, temperature and rainfall and the north Atlantic oscillation over morocco. Clim Res. 2016;69:229–38. https://doi.org/10.3354/cr01409.
Ouatiki H, Boudhar A, Ouhinou A, Arioua A, Hssaisoune M, Bouamri H, Benabdelouahab T. Trend analysis of rainfall and drought over the Oum Er-Rbia river basin in morocco during 1970–2010. Arab J Geosci. 2019. https://doi.org/10.1007/s12517-019-4300-9.
Schilling J, Freier KP, Hertig E, Scheffran J. Agriculture, ecosystems and environment climate change, vulnerability and adaptation in North Africa with focus on morocco. Agric Ecosyst Environ. 2012;156:12–26. https://doi.org/10.1016/j.agee.2012.04.021.
Singla S, Mahe GIL, Dieulin C, Driouech F, Milan M, Zohra F, Guelai EL, Ardoin-bardin S. Evolution des relations pluie-debit sur des bassins versants du maroc. 2010. 1999.
Zhao F, Zongxue XU, Lu Z, Depeng ZUO. Streamflow response to climate variability and human activities in the upper catchment of the yellow river basin. 2009. 52. https://doi.org/10.1007/s11431-009-0354-3.
Morote F, Olcina J, Hernández M. The use of non-conventional water resources as a means of adaptation to drought and climate change in semi-arid regions: South-eastern Spain. Water. 2019. https://doi.org/10.3390/w11010093.
Acknowledgements
This work is supported by the research program “MorSnow”, within CRSA,Mohammed VI Polytechnic University (UM6P), Morocco, (Specific agreement no 39 between OCP S.A and UM6P) and the research program (GEANTech) funded by the Moroccan Ministry of Higher Education, Scientific Research, and Innovation and the OCP Foundation through the APRD. The authors are grateful to the Tensift Hydraulic Basin Agency and LMI for providing ground hydroclimatic data used in this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Bargam, B., Boudhar, A., Kinnard, C. et al. Evaluation of the support vector regression (SVR) and the random forest (RF) models accuracy for streamflow prediction under a data-scarce basin in Morocco. Discov Appl Sci 6, 306 (2024). https://doi.org/10.1007/s42452-024-05994-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42452-024-05994-z